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A NOVEL HAEMOPOIETIN RECEPTOR AND GENETIC 
SEQUENCES ENCODING SAME 

The present invention relates generally to a novel 
5 haemopoietin receptor or derivatives thereof and to 

genetic sequences encoding same. Interaction between 
the novel receptor of the present invention and a ligand 
facilitates proliferation, differentiation and survival 
of a wide variety of cells. The novel receptor and its 
10 derivatives and the genetic sequences encoding same of 

the present invention are useful in the development of a 
wide range of agonists, antagonists, therapeutics and 
diagnostic reagents based on ligand interaction with its 
receptor . 

15 

Bibliographic details of the publications numerically 
referred to in this specification are collected at the 
end of the description. Sequence Identity Numbers (SEQ 
ID NOs.) for the nucleotide and amino acid sequences 
20 referred to in the specification are defined following 
the bibliography. 

Throughout this specification and the claims which 
follow, unless the context requires otherwise, the ^word 
25 "comprise", or variations such as "comprises" or 

"comprising", will be understood to imply the inclusion 
of a stated integer or group of integers but not the 
exclusion of any other integer or group of integers. 

3 0 The rapidly increasing sophistication of recombinant DNA 
techniques is greatly facilitating research into the 
medical and allied health fields. Cytokine research is 
of particular importance, especially as these molecules 
regulate the proliferation, differentiation and function 

35 of a wide variety of cells. Administration of 

recombinant cytokines or regulating cytokine fiinction 
and/or synthesis is becoming increasingly the focus of 
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medical research into the treatment of a range of 
disease conditions . 

Despite the discovery of a range of cytokines and other 
5 secreted regulators of cell function, comparatively few 
cytokines are directly used or targeted in therapeutic 
regimens. One reason for this is the pleiotropic nature 
of many cytokines. For example, interleukin (IL)-ll is 
a functionally pleiotropic molecule (1,2), initially 

10 characterized by its ability to stimulate proliferation 
of the IL- 6 -dependent plasmacytoma cell line, Til 65 
(3) . Other biological actions of IL-11 include 
induction of mult ipotential haemopoietin progenitor cell 
proliferation (4,5,6), enhancement of megakaryocyte and 

15 platelet formation (7,8,9,10), stimulation of acute 

phase protein synthesis (11) and inhibition of adipocyte 
lipoprotein lipase activity (12, 13). 

Other important cytokines in the IL-11 group include IL- 
20 6, leukaemia inhibitory factor (LIF) , oncostatin M (OSM) 

and CNTF. All these cytokines exhibit pleiotropic 
properties with significant activities in proliferation, 
differentiation and survival of cells. Members of the 
haemopoietin receptor family are defined by the presence 
25 of a conserved amino acid domain in their extracellular 
region. However, despite the low level of amino acid 
sequence conservation between other haemopoietin 
receptor domains of different receptors, they are all 
predicted to assume a similar tertiary structure, 
30 centred around two f ibronectin- type III repeats (18,19). 

The size of the haemopoietin receptor family has now 
become extensive and includes the cell surface receptors 
for may cytokines including interleukin-2 (lLi-2), IL-3, 
35 IL-4, IL-5, IL-6, IL-7, IL-9, IL-11, IL-12, IL-13, IL- 

15, granulocyte colony stimulating factor (G-CSF) , 
granulocyte-macrophage-CSF (GM-CSF) , erythropoietin, 
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thrombopoietin, leptin, leukaemia inhibitory factor, 
oncostatin-M, ciliary neurotrophic factor, 
cardiotrophin, growth hormone and prolactin. Although 
most of the members of the haemopoietin receptor family 
5 act as classic cell surface receptors, binding their 
cognate ligand at the cell surface and initiating 
intracellular signal transduction, some receptors are 
also produced in naturally occuuring soluble forms. 
These soluble receptors can either act as cytokine 

10 antagonists, by binding to cytokines and inhibiting 

productive interactions with cell surface receptors (eg 
LIF binding protein; (20) or as agonists, binding to 
cytokine and potentiating interaction with cell surface 
receptor components (eg soluble interleukin- 6 receptor 

15 a-chain; (21) . Still other members of the family appear 
to be produced only as secreted proteins, with no 
evidence of a cell surface form. In this regard, the 
IL-12 p40 subunit is a useful example. The cytokine IL- 
12 is secreted as a heterodimer composed of a p35 

20 subunit which shows similarity to cytokines such as IL-6 
(22) and a p40 subunit which shares similarity with the 
IL-6 receptor a-chain (23) . In this case the soluble 
receptor acts as part of the cytokine itself and 
essential to formation of an active protein. In 

25 addition to acting as cytokines (eg IL-12p40) , cytokine 
agonists (eg IL-6 receptor a-chain) or cytokine 
antagonists (LIF binding protein) , members of the 
haemopoietin receptor have been useful in the discovery 
of small molecule cytokine mimetics. For example, the 

30 discovery of peptide mimetics of two commercially 

valuable cytokines, erythropoietin and thrombopoietin, 
centred on the selection of peptides capable of binding 
to soluble versions of the erythropoietin and 
thrombopoietin receptors (24,25) . Due to the importance 

35 and multifactorial nature of these cytokines, there is a 
need to identify receptors, including both cell bound 
and soluble, for pleiotropic cytokines. Identification 
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35 



of such receptors permits the identification of 
pleiotropic cytokines and the development of a range of 
therapeutic and diagnostic agents. 

Accordingly, one aspect of the present invention relates 
to a nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence 
encoding a novel haemopoietin receptor or a derivative 
thereof . 



More particularly, the present invention provides a 
nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence 
encoding a novel haemopoietin receptor or a derivative 
15 thereof having the motif: 

Trp Ser Xaa Trp Ser (SEQ ID NO:l], 
wherein Xaa is any amino acid and is preferably Asp or 
Glu, 



Even more particularly, the present invention is 
directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a 
sequence encoding a novel haemopoietin receptor or a 
derivative thereof, said receptor comprising the motif: 

Trp Ser Xaa Trp Ser [SEQ ID NO:l] 



wherein Xaa is any amino acid and is preferably Asp or 
Glu, said nucleic acid molecule is identifiable by 
3 0 hybridisation to said molecule under low stringency 
conditions at 42EC with 

5N (A/G)CTCCA(A/G)TC(A/G)CTCCA 3N [SEQ ID NO: 7) 
and 

5N (A/G)CTCCA(C/T)TC(A/G)CTCCA 3N [SEQ ID NO:8]. 



Still more particularly, the present invention provides 
an isolated nucleic acid molecule comprising a sequence 
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of nucleotides substantially as set forth in SEQ ID 
NO: 12 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 12 or a nucleotide sequence capable of hybridising 
5 thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 
haemopoietin receptor or a derivative thereof. 

In a related embodiment, the present invention provides 
10 an isolated nucleic acid molecule comprising a sequence 
of nucleotides substantially as set forth in SEQ ID 
NO: 14 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 14 or a nucleotide sequence capable of hybridising 
15 thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 
haemopoietin receptor or a derivative thereof. 

In another related embodiment, the present invention 

2 0 provides an isolated nucleic acid molecule comprising a 

sequence of nucleotides substantially as set forth in 
SEQ ID NO: 16 or a nucleotide sequence having at least 
60% similarity to the nucleotide sequence set forth in 
SEQ ID NO: 16 or a nucleotide sequence capable of 
25 hybridising thereto under low stringency conditions at 
42EC and wherein said nucleotide sequence encodes a 
novel haemopoietin receptor or a derivative thereof. 

In a further related embodiment , the present invention 
30 provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides substantially as set forth in 
SEQ ID NO: 18 or a nucleotide sequence having at least 
60% similarity to the nucleotide sequence set forth in 
SEQ ID NO: 18 or a nucleotide sequence capable of 

3 5 hybridising thereto under low stringency conditions at 

42EC and wherein said nucleotide sequence encodes a 
novel haemopoietin receptor or a derivative thereof. 

- 5 - 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO ^981 1225A2J_> 



wo 98/11225 



PCT/GB97/02479 



In yet a further related embodiment, the present 
invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides substantially as 
set forth in SEQ ID NO: 24 or a nucleotide sequence 
5 having at least 60% similarity to the nucleotide 

sequence set forth in SEQ ID NO: 24 or a nucleotide 
sequence capable of hybridising thereto under low 
stringency conditions at 42EC and wherein said 
nucleotide sequence encodes a novel haemopoietin 
10 receptor or a derivative thereof. 

Still yet a further embodiment of the present invention 
is directed to a sequence of nucleotides substantially 
as set forth in SEQ ID NO: 28 or a nucleotide sequence 
having at least 60% similarity to the nucleotide 
sequence set forth in SEQ ID NO: 28 or a nucleotide 
sequence capable of hybridising thereto under low 
stringency conditions at 4 2EC and wherein said 
nucleotide sequence encodes a novel haemopoietin 
receptor or a derivative thereof. 

In still yet another embodiment, the present invention 
provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides substantially set forth in SEQ 
ID NO: 38 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 38 or a nucleotide sequence capable of hybridising 
thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 
haemopoietin receptor or a derivative thereof. 

The term "receptor" is used in its broadest sense and 
includes any molecule capable of binding, associating or 
otherwise interacting with a ligand. Generally, the 
35 interaction will have a signalling effect although the 
present invention is not necessarily so limited. For 
example, the "receptor" may be in soluble form, often 
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referred to as a cytokine binding protein. A receptor 
may be deemed a receptor notwithstanding that its ligand 
or ligands has or have not been identified. 

Preferably, the novel receptor is derived from a mammal 
or a species of bird. Particularly, preferred mammals 
include humans, primates, laboratory test animals (e.g. 
mice, rats, rabbits, guinea pigs), livestock animals 
(e.g. sheep, horses, pigs, cows), companion animals 
(e.g. dogs, cats) or captive wild animals (e.g. deer, 
foxes, kangaroos) . Although the present invention is 
exemplified with respect to mice, the scope of the 
subject invention extends to all animals and in 
particular humans. 



The present invention is predicated in part on an 
ability to identify members of the haemopoietin receptor 
family with limited sequence similarity. Based on this 
approach, a genetic sequence has been identified in 

20 accordance with the present invention which encodes a 
novel receptor. The expressed genetic sequence is 
referred to herein as "NR6" . Different forms of NR6 are 
referred to as, for example, NR6.1, NR6 . 2 and NR6 . 3 . 
The nucleotide and corresponding amino acid sequences 

25 for these molecules are represented in SEQ ID NOs:12, 14 
and 16, respectively. 

Preferred human and murine nucleic acid sequences for 
NR6 or its derivatives include sequences from brain. 
30 liver, kidney, neonatal, embryonic, cancer or tumour- 
derived tissues. 

Reference herein to a low stringency at 42EC includes 
and encompasses from at least about 1% v/v to at least 
35 about 15% v/v formamide and from at least about IM to at 
least about 2M salt for hybridisation, and at least 
about IM to at least about 2M salt for washing 
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conditions. Alternative stringency conditions may be 
applied where necessary, such as medium stringency, 
which includes and encompasses from at least about 16% 
v/v to at least about 30% v/v formamide and from at 
5 least about 0 . 5M to at least about 0 . 9M salt for 

hybridisation, and at least about 0 . 5M to at least about 
0 . 9M salt for washing conditions, or high stringency, 
which includes and encompasses from at least about 31% 
v/v to at least about 50% v/v formamide and from at 
10 least about O.OIM to at least about 0.15M salt for 
hybridisation, and at least about O.OIM to at least 
about 0-15M salt for washing conditions. 

The nucleic acid molecules contemplated by the present 
15 invention are generally in isolated form and are 
preferably cDNA or genomic DNA molecules. In a 
particularly preferred embodiment, the nucleic acid 
molecules are in vectors and most preferably expression 
vectors to enable expression in a suitable host cell. 
20 Particularly useful host cells include prokaryotic 

cells, mammalian cells, yeast cells and insect cells. 
The cells may also be in the form of a cell line. 

Accordingly, another aspect of the present invention 
25 provides an expression vector comprising a nucleic acid 
molecule encoding the novel haempoietin receptor or a 
derivative thereof as hereinbefore described, said 
expression vector capable of expression in a selected 
host cell. 

30 

Another aspect of the present invention contemplates a 
method for cloning a nucleotide sequence encoding NR6 or 
a derivative thereof, said method comprising searching a 
nucleotide data base for a sequence which encodes the 
35 amino acid sequence set forth in SEQ ID NO:l, designing 
one or more oligonucleotide primers based on the 
nucleotide sequence located in the search, screening a 
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nucleic acid library with said one or more 
oligonucleotides and obtaining a clone therefrom which 
encodes said NR6 or part thereof . 

5 Once a novel nucleotide sequence is obtained as 

indicated above encoding NR6, oligonucleotides may be 
designed which bind cDNA clones with high stringency. 
Direct colony hybridisation may be employed or PGR 
amplification may be used. The use of oligonucleotide 
10 primers which bind under conditions of high stringency 
ensures rapid cloning of a molecule encoding the novel 
NR6 and less time is required in screening out cloning 
artefacts. However, depending on the primers used, low 
or medium stringency conditions may also be employed. 

15 

Alternatively, a library may be screened directly such 
as using oligonucleotides set forth in SEQ ID NO : 7 or 
SEQ ID NO: 8 or a mixture of both oligonucleotides may be 
used. In addition, one or more of oligonucleotides 
20 defined in SEQ ID NO: 2 to 11 may also be used. 

Preferably, the nucleic acid library is a cDNA, genomic, 
cDNA expression or mRNA library. 

25 Preferably, the nucleic acid library is a cDNA 
expression library. 

Preferably, the nucleotide data base is of human or 
murine origin and of brain, liver, kidney, neo-natal 
30 tissue, embryonic tissue, tumour or cancer tissue 
origin. 

Preferred percentage similarities to the reference 
nucleotide sequences include at least about 70%, more 
35 preferably at least about 80%, still more preferably at 
least about 90% and even more preferably at least about 
95% or above. 
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Another aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a novel haempoietin receptor or 
derivative thereof having an amino acid sequence as set 
5 forth in SEQ ID NO: 13 or having at least about 50% 
similarity to all or part thereof. 

Still yet another aspect of the present invention 
provides an isolated nucleic acid molecule comprising a 
10 sequence of nucleotides encoding a novel haempoietin 
receptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 15 or having at least 
about 50% similarity to all or part thereof. 

15 Even yet another aspect of the present invention 

provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides encoding a novel haempoietin 
receptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 17 or having at least 

20 about 50% similarity to all or part thereof. 

A further aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a novel haempoietin receptor or 

2 5 derivative thereof having an amino acid sequence as set 

forth in SEQ ID NO: 19 or having at least about 50% 
similarity to all or part thereof. 

Even yet a another aspect of the present invention 

3 0 provides an isolated nucleic acid molecule comprising a 

sequence of nucleotides encoding a novel haempoietin 
receptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 25 or having at least 
about 50% similarity to all or part thereof. 



35 



Another aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
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nucleotides encoding a novel haempoiet in receptor or 
derivative thereof having an amino acid sequence as set 
forth in one or more of SEQ ID NOs:29 or having at least 
about 50% similarity to all or part thereof. 

5 

Preferably, the percentage amino acid similarity is at 
least about 60%, more preferably at least about 70%, 
even more preferably at least about 80-85% and still 
even more preferably at least about 90-95% or greater. 

The NR6 polypeptide contemplated by the present 
invention includes, therefore, derivatives which are 
components, parts, fragments, homologues or analogues of 
the novel haempoietin receptors which are preferably 

15 encoded by all or part of a nucleotide sequences 

substantially set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 25 or 20 or 24 or 28 or 38 or a molecule having at 
least about 60% nucleotide similarity to all or part 
thereof or a molecule capable of hybridising to the 

20 nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 

16 or 18 or 20 or 24 or 28 or 38 or a complementary form 
thereof. The NR6 molecule may be glycosylated or non- 
glycosylated. When in glycosylated form, the 
glycosylation may be substantially the same as naturally 

2 5 occurring haempoietin receptor or may be a modified form 

of glycosylation. Altered or differential glycosylation 
states may or may not affect binding activity of the 
novel receptor. 

3 0 The NR6 haemopoietin receptor may be in soluble form or 

may be expressed on a cell surface or conjugated or 
fused to a solid support or another molecule. 

As stated above, the present invention further 
35 contemplates a range of derivatives of NR6 . Derivatives 
include fragments, parts, portions, mutants, homologues 
and analogues of the NR6 polypeptide and corresponding 
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genetic sequence. Derivatives also include single or 
multiple amino acid substitutions, deletions and/or 
additions to NR6 or single or multiple nucleotide 
substitutions, deletions and/or additions to the genetic 
5 sequence encoding NR6 . "Additions" to amino acid 

sequences or nucleotide sequences include fusions with 
other peptides, polypeptides or proteins or fusions to 
nucleotide sequences. Reference herein to ANR6" 
includes reference to all derivatives thereof including 
10 functional derivatives or NR6 immunologically 
interactive derivatives. 

Analogues of NR6 contemplated herein include, but are 
not limited to, modification to side chains, 
15 incorporating of unnatural amino acids and/or their 
derivatives during peptide, polypeptide or protein 
synthesis and the use of crosslinkers and other methods 
which impose conformational constraints on the 
proteinaceous molecule or their analogues. 

20 

Examples of side chain modifications contemplated by the 
present invention include modifications of amino groups 
such as by reductive alkylation by reaction with an 
aldehyde followed by reduction with NaBH4 ; amidination 

25 with methylacetimidate; acylation with acetic anhydride; 
carbamoylation of amino groups with cyanate ; 
trinitrobenzylation of amino groups with 2, 4, 6- 
trinitrobenzene sulphonic acid (TNBS) ; acylation of 
amino groups with succinic anhydride and 

3 0 tetrahydrophthalic anhydride; and pyridoxylat ion of 

lysine with pyridoxal- 5 -phosphate followed by reduction 
with NaBH4 • 

The guanidine group of arginine residues may be modified 
3 5 by the formation of heterocyclic condensation products 

with reagents such as 2 , 3 -butanedione , phenylglyoxal and 
glyoxal . 
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The carboxyl group may be modified by carbodiimide 
activation via O-acylisourea formation followed by 
subsequent derivit isation, for example, to a 
corresponding amide . 

r- 

Sulphydryl groups may be modified by methods such as 
carboxymethylation with iodoacetic acid or 
iodoacetamide ; performic acid oxidation to cysteic acid; 
formation of a mixed disulphides with other thiol 

10 compounds; reaction with maleimide, maleic anhydride or 
other substituted maleimide; formation of mercurial 
derivatives using 4 -chloromercuribenzoate , 4- 
chloromercuriphenylsulphonic acid, phenylmercury 
chloride, 2 -chloromercuri -4 -nitrophenol and other 

15 mercurials; carbamoylation with cyanace at alkaline pH . 

Tryptophan residues may be modified by, for example, 
oxidation with N-bromosuccinimide or alkylation of the 
indole ring with 2 -hydroxy- 5 -nitrobenzyl bromide or 
20 sulphenyl halides. Tyrosine residues on the other hand, 
may be altered by nitration with tetranitromethane to 
form a 3 -nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine 
25 residue may be accomplished by alkylation with 

iodoacetic acid derivatives or N-carbethoxylat ion with 
diethylpyrocarbonate . 

Examples of incorporating unnatural amino acids and 
3 0 derivatives during peptide synthesis include, but are 

not limited to, use of norleucine, 4-amino butyric acid, 
4-amino-3-hydroxy-5-phenylpentanoic acid, 6- 
aminohexanoic acid, t -butylglycine , norvaline, 
phenylglycine , ornithine , sarcosine , 4 -amino- 3 -hydroxy - 
35 6-methylheptanoic acid, 2-thienyl alanine and/or D- 

isomers of amino acids. A list of unnatural amino acid, 
contemplated herein is shown in Table 1. 
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These types of modifications may be important to 
stabilise NR6 if administered to an individual or for 
use as a diagnostic reagent. 

5 Crosslinkers can be used, for example, to stabilise 3D 

conformations, using homo-bif unct ional crosslinkers such 
as the bifunctional imido esters having (CH2)n spacer 
groups with n=l to n=6, glutaraldehyde , N- 
hydroxysuccinimide esters and hetero-bif unct ional 

10 reagents which usually contain an amino-reactive moiety 
such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) 
or carbodiimide (COOH) . In addition, peptides can be 
conformationally constrained by, for example, 

15 incorporation of C" and N ..-methylamino acids, 

introduction of double bonds between C and C5 atoms of 
amino acids and the formation of cyclic peptides or 
analogues by introducing covalent bonds such as forming 
an amide bond between the N and C termini, between two 

20 side chains or between a side chain and the N or C 
terminus . 
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TABLE 1 



Non -conventional 
amino acid 



Code Non - convent ional 

amino acid 



Code 



aminobutyric acid 

Amino- " -methylbutyrace 

aminocyclopropane- 

carboxylate 
10 aminoisobutyric acid 

aminonorbornyl - 

carboxylate 

eye 1 ohexy 1 a 1 ani ne 

eye lopentyl alanine 
15 D-alanine 

D-arginine 

D-aepartic acid 

D- cysteine 

D-glutamine 
20 D-glutamic acid 

D-histidine 

D-isoleucine 

D-leucine 

D-lysine 

2 5 D-methionine 

D-ornithine 
D - phenyl a 1 an ine 
D-proline 
D-serine 
30 D- threonine 

D- tryptophan 
D-tyrosine 
D-valine 

D- " -methylalanine 

3 5 D- " -methylarginine 

D- " -methylasparagine 
D- " -methylaspartate 



Abu 

Mgabu 

Cpro 

Aib 
Norb 



Cpen 

Dal 

Darg 

Oasp 

Dcys 

Dgln 

Dglu 

Dhis 

Dile 

Dleu 

Dlys 

Dmet 

Dorn 

Dphe 

Dpro 

Dser 

Dthr 

Dtrp 

Dtyr 

Dval 

Dmala 

Dmarg 

Dmasn 

Dmasp 



L-N-methylalanine Nmala 

L-N -methylarginine Nmarg 

L-N -methylasparagine Nmasn 

L-N-methylaspartic acid Nmasp 

L-N -methy leys teine Nmcys 

L-N-methylglutamine . Nmgln 

L-N-methylglutamic acid Nmglu 

ChexaL-N-methylhistidine Nmhis 

L-N-methylisol leucine Nmile 

L-N-methyl leucine Nmleu 

L-N -methyl lysine Nmlys 

L-N -methy Imethionine Nmmet 

L -N -me thy Inor leucine Nmnle 

L-N-methylnorvaline Nmnva 

L-N -me thy lorni thine Nmorn 

L-N-methylphenylalanine Nmphe 

L-N-methylproline Nmpro 

L-N -methyl serine Nmser 

L - N - me t hy 1 1 hr e on ine Nmthr 

L-N -methyl tryptophan Nmtrp 

L-N -methyl tyrosine Nmtyr 

L-N-methylvaline Nmval 

L-N-methylethylglycine Nmetg 

L-N-methyl -t-butylglyeine Nmtbug 

L-norleucine Nle 

L-norvaline Nva 

" -methyl -aminoisobutyrate Maib 

" -methyl - ( - aminobutyrate Mgabu 

" -methylcyclohexylalanine Mchexa 
" -methylcylcopentylalanine Mcpen 

" -methyl - " - napthylalanine Manap 

" -methy Ipenicillamine Mpen 
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0 



D- " -methylcysteine 


Dmcys 


N- {4 -aminobutyl > glycine 


Nglu 


D- " -methylglutamine 


Dmgln 


N- ( 2 -aminoethyl) glycine 


Naeg 


D- " -methylhistidine 


Dmhis 


N- ( 3 -aminop ropy!) glycine 


Norn 


D- " -methylisoleucine 


Dmile 


N- amino- " -methylbutyrate 


Nmaabu 


D - " -methylleucine 


Dmleu 


" -napthylalanine 


Anap 


D- " -methyllysine 


Dmlys 


N-benzylglyeine 


Nphe 


D- " -methylmethionine 


Dmmet 


N- (2 -carbamylethyl) glycine 


Ngln 


D- " -methylornithine 


Dmorn 


N- (carbamylmethyl) glycine 


Nasn 


D- " -methylphenylalanine 


Dmphe 


N- (2-carboxyethyl) glycine 


Nglu 


D- " -methylproline 


Dmpro 


N- (carboxymethyl) glycine 


Nasp 


-methyl serine 


Dmser 


N-cyclobutylglycine 


Ncbut 


D" -methyl threonine 


Dmthr 


N-cycloheptylglycine 


Nchep 


D- " -methyltryptophan 


Dmtrp 


N-cyclohexylglycine 


Nchex 


D- " -methyltyrosine 


Dmty 


N-cyclodecylglycine 


Ncdec 


D- " -methylvaline 


Dmval 


N-cylcododecylglycine 


Ncdod 


D-N-methylalanine 


Dnmala 


N-cyclooctylglycine 


Ncoct 


D-N-methylarginine 


Dnmarg 


N-cyclopropylglycine 


Ncpro 


D-N-methylasparagine 


Dnmasn 


N-cycloundecylglycine 


Ncund 


D-N-methylaspartate 


Dnmasp 


N- < 2 , 2 -diphenylethyl ) glycine Nbhm 


D-N-methylcysteine 


Dnmcys 


N- (3 , 3-diphenylpropyl) glycine Nbhe 


D-N-methylglutamine 


Dnmgln 


N- (3 -guanidinopropyl) glycine Narg 


D-N-methylglutamate 


Dnmglu 


N- (l-hydroxyethyl) glycine 


Nthr 


D-N-methylhistidine 


Dnmhis 


N- (hydroxyethyl) ) glycine 


Nser 


D-N-methylisoleucine 


Dnmile 


N- (imidazolylethyl) ) glycine Nhis 


D-N-methyl leucine 


Dnmleu 


N- (3 -indolylyethyl) glyeine 


Nhtrp 


D-N -methyllysine 


Dnmlys 


N -methyl- ( -aminobutyrate 


Nmgabu 


N - me thy 1 eye 1 ohexy 1 a 1 an ine Nmehexa 


D-N-methylmethionine 


Dntnmet 


D-N-methylorni thine 


Dnmorn 


N-methylcyclopentylalanine 




NmcpenN-methylglycine 


Nala 


D-N -methylphenylalanine 


Dnmphe 


N-methylaminoioobutyrate 


Nmaib 


D-N-methylproline 


Dnrapro 


N- (1-methylpropyl) glyeine Nile 


D-N-methyl serine 


Dnmser 


N- (2 -methylpropyl) glycine Nleu 


D-N -methyl threonine 


Dnmthr 


D-N-methyl tryptophan 


Dnmtrp 


N- (1-methylethyl) glycine 


Nval 


D-N-methyltyrosine 


Dnmtyr 


N-methy la -napthylalanine 


Nmanap 


D-N-methylvaline 


Dnmival 


N-methylpenicillamine 


Nmpen 



(-aminobucyric acid Gabu n- (p-hydroxyphenyl ) glycine Nhtyr 

L-c-butylglycine Tbug N- (thiomethyl > glycine Ncys 
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li-ethylglycine Etg 

L-hoTOophenylalanine Hphe 

L- " -methylarginine Marg 

L_ «i -methylaspartate Maep 

5 L- " -methyl cysteine Mcys 

L- " -methylglutamine Mgln 

L- " -methylhistidine Mhis 

L- " -methylisoleucine Mile 

L- " -methylleucine Mleu 

10 L- " -methylmethionine Mmet 

L- " -methylnorvaline Mnva 

L- " -methylphenylalanine Mphe 

L- " -methyl serine Mser 

L- " -methyltryptophan Mti-p 

15 L- " -methylvaline Mval 

N- (N- (2, 2-diphenylethyl) Nnbhm 
carbamylmethyl ) glycine 



1-carboxy-l- (2 , 2-diphenyl- Nmbc 



penicillamine Pen 
L- " -methylalanine Mala 
L - " -methylasparagine Masn 
L- " -methyl - t-butylglycine Mtbug 
L-methylethylglycine Metg 
L - " -methylglutamate Mglu 
L _ It - me thylhomophenyl alanine Mhphe 
N- ( 2 -methyl thioethyl) glycine Nmet 
L- " -methyllysine Mlys 
li- " -methylnorieucine Mnle 
L - " -me thy lorni thine Morn 
L - " -me thy Iprol ine Mpro 
L- " -methylthreonine Mthr 
L- " -methyltyrosine Mtyr 
L-N-methylhomophenylalanine Nmhph 



N- (N- ( 3 , 3 -dipheny Ipropyl ) Nnbhe 
carbamylmethyl ) glycine 
ethylamino) cyclopropane 



20 

The present invention further contemplates chemical 
analogues of NR6 capable of acting as antagonists or 
agonists of NR6 or which can act as functional analogues 
of NR6. Chemical analogues may not necessarily be 

25 derived from NR6 but may share certain conformational 

similarities. Alternatively, chemical analogues may be 
specifically designed to mimic certain physiochemical 
properties of NR6 . Chemical analogues may be chemically 
synthesised or may be detected following, for example, 

3 0 natural product screening. 

The identification of NR6 permits the generation of a 
range of therapeutic molecules capable of modulating 
expression of NR6 or modulating the activity of NR6 . 
35 Modulators contemplated by the present invention 

includes agonists and antagonists of NRG expression. 
Antagonists of NR6 expression include antisense 
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5 



molecules, ribozymes and co-suppression molecules. 
Agonists include molecules which increase promoter 
ability or interfere with negative regulatory 
mechanisms. Agonists of NR6 include molecules which 
overcome any negative regulatory mechanism. Antagonists 
of NR6 include antibodies and inhibitor peptide 
fragments. 

Other derivatives contemplated by the present invention 
include a range of glycosylation variants from a 
completely unglycosylated molecule to a modified 
glycosylated molecule. Altered glycosylation patterns 
may result from expression of recombinant molecules in 
different host cells. 



Another embodiment of the present invention 
contemplates a method for modulating expression of NR6 
in a subject such as a human or mouse, said method 
comprising contacting the genetic sequence encoding NR6 

iO with an effective amount of a modulator of NRG 

expression for a time and under conditions sufficient to 
up- regulate or down-regulate or otherwise modulate 
expression of NR6 . Modulating NR6 expression provides a 
means of modulating NR6-ligand interaction or NR6 

IS stimulation of cell activities. 

Another aspect of the present invention contemplates a 
method of modulating activity of NR6 in a human, said 
method comprising administering to said mammal a 

3 0 modulating effective amount of a molecule for a time and 
under conditions sufficient to increase or decrease NR6 
activity. The molecule may be a proteinaceous molecule 
or a chemical entity and may also be a derivative of NR6 
or its ligand or a chemical analogue or truncation 

3 5 mutant of NR6 or its ligand. 

The present invention, therefore, contemplates a 
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pharmaceutical composition comprising NR6 or a 
derivative thereof or a modulator of NR6 expression or 
NR6 activity and one or more pharmaceutically acceptable 
carriers and/or diluents. These components are referred 
5 to as the Aactive ingredients®. 

The pharmaceutical forms suitable for injectable use 
include sterile aqueous solutions (where water soluble) 
and sterile powders for the extemporaneous preparation 

10 of sterile injectable solutions. It must be stable 

under the conditions of manufacture and storage and must 
be preserved against the contaminating action of 
microorganisms such as bacteria and fungi. The carrier 
can be a solvent or dilution medium comprising, for 

15 example, water, ethanol, polyol (for example, glycerol, 
propylene glycol and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. 
The proper fluidity can be maintained, for example, by 
the use of superf actants . The preventions of the action 

2 0 of microorganisms can be brought about by various 

antibacterial and antifungal agents, for example, 
parabens, chlorobutanol , phenol, sorbic acid, 
thirmerosal and the like. In many cases, it will be 
preferable to include isotonic agents, for example, 
25 sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use 
in the compositions of agents delaying absorption, for 
example, aluminum monostearate and gelatin. 

30 Sterile injectable solutions are prepared by 

incorporating the active compounds in the required 
amount in the appropriate solvent with various of the 
other ingredients enumerated above, as required, 
followed by filtered sterilization. In the case of 

3 5 sterile powders for the preparation of sterile 

injectable solutions, the preferred methods of 
preparation are vacuum drying and the freeze -drying 
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technique which yield a powder of the active ingredient 
plus any additional desired ingredient from previously 
sterile-filtered solution thereof. 

5 When the active ingredients are suitably protected they 
may be orally administered, for example, with an inert 
diluent or with an assimilable edible carrier, or it may 
be enclosed in hard or soft shell gelatin capsule, or it 
may be compressed into tablets, or it may be 
10 incorporated directly with the food of the diet. For 

oral therapeutic administration, the active compound may 
be incorporated with excipients and used in the form of 
ingestible tablets, buccal tablets, troches, capsules, 
elixirs, suspensions, syirups, wafers, and the like. 
15 Such compositions and preparations should contain at 

least 1% by weight of active compound. The percentage 
of the compositions and preparations may, of course, be 
varied and may conveniently be between about 5 to about 
80% .of the weight of the unit. The amount of active 
20 compound in such therapeutically useful compositions in 
such that a suitable dosage will be obtained. Preferred 
compositions or preparations according to the present 
invention are prepared so that an oral dosage unit form 
contains between about 0. 1 ug and 2000 mg of active 
25 compound. Alternative dosage amounts include from about 
1 Fg to about 1000 mg and from about 10 Fg to about 500 
mg . 



30 



The tablets, troches, pills, capsules and the like may 
also contain the components as listed hereafter: A 
binder such as gum, acacia, corn starch or gelatin; 
excipients such as dicalcium phosphate; a 
disintegrating agent such as corn starch, potato starch, 
alginic acid and the like; a lubricant such as 
35 magnesium stearate; and a sweetening agent such a 
sucrose, lactose or saccharin may be added or a 
flavouring agent such as peppermint, oil of wintergreen. 



- 20 - 



BNSDOCJD: <WO 981122SA2J_> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



or cherry flavouring. When the dosage unit form is a 
capsule, it may contain, in addition to materials of the 
above type, a liquid carrier. Various other materials 
may be present as coatings or to otherwise modify the 
5 physical form of the dosage unit. For instance, 

tablets, pills, or capsules may be coated with shellac, 
sugar or both. A syrup or elixir may contain the active 
compound, sucrose as a sweetening agent, methyl and 
propylparabens as preservatives, a dye and flavouring 

10 such as cherry or orange flavour. Of course, any 

material used in preparing any dosage unit form should 
be pharmaceutical ly pure and substantially non-toxic in 
the amounts employed. In addition, the active 
compound (s) may be incorporated into sustained-release 

15 preparations and formulations. 

The present invention also extends to forms suitable for 
topical application such as creams, lotions and ^ gels as 
well as a range of "paints" which are applied to skin 
2 0 and through which the active ingredients are absorbed. 

Pharmaceutically acceptable carriers and/or diluents 
include any and all solvents, dispersion media, 
coatings, antibacterial and antifungal agents, isotonic 

25 and absorption delaying agents and the like. The use of 
such media and agents for pharmaceutical active 
substances is well known in the art and except insofar 
as any conventional media or agent is incompatible with 
the active ingredient, their use in the therapeutic 

30 compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the 
compositions . 

It is especially advantageous to formulate parenteral 
35 compositions in dosage unit form for ease of 

administration and uniformity of dosage. Dosage unit 
form as used herein refers to physically discrete units 
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suited as unitary dosages for the mammalian subjects to 
be treated; each unit containing a predetermined 
quantity of active material calculated to produce the 
desired therapeutic effect in association with the 
5 required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are 
dictated by and directly dependent on (a) the unique 
characteristics of the active material and the 
particular therapeutic effect to be achieved, and (b) 
10 the limitations inherent in the art of compounding such 
an active material for the treatment of disease in 
living subjects having a diseased condition in which 
bodily health is impaired as herein disclosed in detail. 

15 The principal active ingredient is compounded for 

convenient and effective administration in effective 
amounts with a suitable pharmaceutically acceptable 
carrier in dosage unit form as hereinbefore disclosed. 
A unit dosage form can, for example, contain the 

20 principal active compound in amounts ranging from 0.5 :g 
to about 2000 mg. Expressed in proportions, the active 
compound is generally present in from about 0.5 :g to 
about 2000 mg/ml of carrier. In the case of 
compositions containing supplementary active 

25 ingredients, the dosages are determined by reference to 
the usual dose and manner of administration of the said 
ingredients . 

Dosages may also be expressed per body weight of the 
recipient. For example, from about 10 ng to about 1000 
30 mg/kg body weight, from about 100 ng to about 500 mg/kg 
body weight and for about 1 Fg to above 250 mg/kg body 
weight may be administered. 

The pharmaceutical composition may also comprise genetic 
3 5 molecules such as a vector capable of trans fecting 

target cells where the vector carries a nucleic acid 
molecule capable of modulating NR6 expression or NR6 
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activity. The vector may, for example, be a viral 
vector . 

Still another aspect of the present invention is 
5 directed to antibodies to NR6 and its derivatives. Such 
antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to NR6 or 
may be specifically raised to NR6 or derivatives 
thereof. In the case of the latter, NR6 or its 

10 derivatives may first need to be associated with a 

carrier molecule. The antibodies and/or recombinant NR6 
or its derivatives of the present invention are 
particularly useful as therapeutic or diagnostic agents. 
For example, NR6 antibodies or antibodies to its ligand 

15 may act as antagonists. 

For example, ims and its derivatives can be used to 
screen for naturally occurring antibodies to NR6 . These 
may occur, for example in some autoimmune diseases. 

20 Alternatively, specific antibodies can be used to screen 
for NR6. Techniques for such assays are well known in 
the art and include, for example, sandwich assays and 
ELISA. Knowledge of NR6 levels may be important for 
diagnosis of certain cancers or a predisposition to 

25 cancers or for monitoring certain therapeutic protocols. 

Antibodies to NR6 of the present invention may be 
monoclonal or polyclonal. Alternatively, fragments of 
antibodies may be used such as Fab fragments. 

3 0 Furthermore, the present invention extends to 

recombinant and synthetic antibodies and to antibody 
hybrids. A "synthetic antibody" is considered herein to 
include fragments and hybrids of antibodies. The 
antibodies of this aspect of the present invention are 

35 particularly useful for immunotherapy and may also be 
used as a diagnostic tool for assessing apoptosis or 
monitoring the program of a therapeutic regimen. 
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For example, specific antibodies can be used to screen 
for NR6 proteins. The latter would be important, for 
example, as a means for screening for levels of NRS in a 
cell extract or other biological fluid or purifying NR6 
5 made by recombinant means from culture supernatant 

fluid. Techniques for the assays contemplated herein 
are known in the art and include, for example, sandwich 
assays and ELISA. 

10 It is within the scope of this invention to include any 
second antibodies (monoclonal, polyclonal or fragments 
of antibodies or synthetic antibodies) directed to the 
first mentioned antibodies discussed above. Both the 
first and second antibodies may be used in detection 

15 assays or a first antibody may be used with a 

commercially available ant i - immunoglobul in antibody. An 
antibody as contemplated herein includes any antibody 
specific to any region of NRG. 

20 Both polyclonal and monoclonal antibodies are obtainable 
by immunization with the enzyme or protein and either 
type is utilizable for immunoassays. The methods of 
obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively 

25 easily prepared by injection of a suitable laboratory 
animal with an effective amount of NRG, or antigenic 
parts thereof, collecting serum from the animal, and 
isolating specific sera by any of the known 
immunoadsorbent techniques. Although antibodies 

30 produced by this method are utilizable in virtually any 
type of immunoassay, they are generally less favoured 
because of the potential heterogeneity of the product. 

The use of monoclonal antibodies in an immunoassay is 
3 5 particularly preferred because of the ability to produce 
them in large quantities and the homogeneity of the 
product. The preparation of hybridoma cell lines for 
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monoclonal antibody production derived by fusing an 
immortal cell line and lymphocytes sensitized against 
the immunogenic preparation can be done by techniques 
which are well known to those who are skilled in the 
5 art . 

Another aspect of the present invention contemplates a 
method for detecting NR6 in a biological sample from a 
subject said method comprising contacting said 
10 biological sample with an antibody specific for NR6 or 
its derivatives or homologues for a time and under 
conditions sufficient for an antibody-NR6 complex to 
form, and then detecting said complex. 

The presence of NR6 may be accomplished in a number of 
15 ways such as by Western blotting and ELISA procedures. 

A wide range of immunoassay techniques are available as 
can be seen by reference to US Patent Nos . 4,016,043, 4, 
424,279 and 4,018,653. These, of course, includes both 
single-site and two-site or "sandwich*' assays of the 
20 non- competitive types, as well as in the traditional 

competitive. binding assays. These assays also include 
direct binding of a labelled antibody to a target. 

Sandwich assays are among the most useful and commonly 
25 used assays and are favoured for use in the present 

invention. A number of variations of the sandwich assay 
technique exist, and all are intended to be encompassed 
by the present invention. Briefly, in a typical forward 
assay, an unlabelled antibody is immobilized on a solid 
30 substrate and the sample to be tested brought into 
contact with the bound molecule. After a suitable 
period of incubation, for a period of time sufficient to 
allow formation of an ant ibody-ant igen complex, a second 
antibody specific to the antigen, labelled with a 
3 5 reporter molecule capable of producing a detectable 
signal is then added and incubated, allowing time 
sufficient for the formation of another complex of 
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antibody-antigen- labelled antibody. Any unreacted 
material is washed away, and the presence of the antigen 
is determined by observation of a signal produced by the 
reporter molecule. The results may either be 
5 qualitative, by simple observation of the visible 
signal, or may be quantitated by comparing with a 
control sample containing known amounts of hapten. 
Variations on the forward assay include a simultaneous 
assay, in which both sample and labelled antibody are 
10 added simultaneously to the bound antibody. These 

techniques are well known to those skilled in the art, 
including any minor variations as will be readily 
apparent. In accordance with the present invention, the 
sample is one which might contain NR6 including cell 
15 extract, tissue biopsy or possibly serum, saliva, 

mucosal secretions, lymph, tissue fluid and respiratory 
fluid. The sample is, therefore, generally a biological 
sample comprising biological fluid but also extends to 
fermentation fluid and supernatant fluid such as from a 
2 0 cell culture. 

In the typical forward sandwich assay, a first antibody 
having specificity for the NR6 or antigenic parts 
thereof, is either covalently or passively bound to a 

25 solid surface. The solid surface is typically glass or 
a polymer, the most commonly used polymers being 
cellulose, polyacrylamide , nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in 
the form of txabes, beads, discs of microplates, or any 

3 0 other surface suitable for conducting an immunoassay. 
The binding processes are well-known in the art and 
generally consist of cross- linking covalently binding or 
physically adsorbing, the polymer -antibody complex is 
washed in preparation for the test sample. An aliquot 

35 of the sample to be tested is then added to the solid 
phase complex and incubated for a period of time 
sufficient (e.g. 2-40 minutes or overnight if more 



- 26 - 



BNSOOCID; <WO 981 122SA2J_> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



10 



convenient) and under suitable conditions (e.g. from 
about room temperature to about 371C) to allow binding 
of any subunit present in the antibody. Following the 
incubation period, the antibody subunit solid phase is 
washed and dried and incubated with a second antibody 
specific for a portion of the hapten. The second 
antibody is linked to a reporter molecule which is used 
to indicate the binding of the second antibody to the 
hapten . 



An alternative method involves immobilizing the target 
molecules in the biological sample and then exposing the 
immobilized target to specific antibody which may or may 
not be labelled with a reporter molecule. Depending on 

15 the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by 
direct labelling with the antibody. Alternatively, a 
second labelled antibody, specific to the first antibody 
is exposed to the target -first antibody complex to form 

20 a target -first ant ibody- second antibody tertiary 

complex. The complex is detected by the signal emitted 
by the reporter molecule . 

In another alternative method, the NR6 ligand is 
25 immobilised to a solid support and a biological sample 

containing NR6 brought into contact with its immobilised 
ligand. Binding between NR5 and its ligand can then be 
determined using an antibody to NRG which itself may be 
labelled with a reporter molecule or a further anti- 
30 immunoglobulin antibody labelled with a reporter 

molecule could be used to detect antibody bound to NR6 . 

By "reporter molecule" as used in the present 
specification, is meant a molecule which, by its 
35 chemical nature, provides an analytically identifiable 
signal which allows the detection of antigen-bound 
antibody. Detection may be either qualitative or 
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quantitative. The most commonly used reporter molecules 
in this type of assay are either enzymes, fluorophores 
or radionuclide containing molecules (i.e. 
radioisotopes) and chemi luminescent molecules. 
5 In the case of an enzyme immunoassay, an enzyme is 

conjugated to the second antibody, generally by means of 
glutaraldehyde or periodate . As will be readily 
recognized, however, a wide variety of different 
conjugation techniques exist, which are readily 
10 available to the skilled artisan. Commonly used enzymes 
include horseradish peroxidase, glucose oxidase beta- 
galactosidase and alkaline phosphatase, amongst others. 
The substrates to be used with the specific enzymes are 
generally chosen for the production, upon hydrolysis by 
15 the corresponding enzyme, of a detectable colour change. 
Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to 
employ fluorogenic substrates, which yield a fluorescent 
product rather than the chromogenic substrates noted 
20 above. In all cases, the enzyme- labelled antibody is 
added to the first antibody hapten complex, allowed to 
bind, and then the excess reagent is washed away. A 
solution containing the appropriate substrate is then 
added to the complex of ant ibody-ant igen-ant ibody . The 
25 substrate will react with the enzyme linked to the 

second antibody, giving a qualitative visual signal, 
which may be further quant it ated, usually 
spectrophotometrically, to give an indication of the 
amount of hapten which was present in the sample . 
30 "Reporter molecule" also extends to use of cell 

agglutination or inhibition of agglutination such as red 
blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein 
3 5 and rhodamine, may be chemically coupled to antibodies 

without altering their binding capacity. When activated 
by illumination with light of a particular wavelength, 
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the fluorochrome- labelled antibody adsorbs the light 
energy, inducing a state to excitability in the 
molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light 
5 microscope. As in the EIA, the fluorescent labelled 

antibody is allowed to bind to the first ant ibody-hapten 
complex. After washing off the unbound reagent, the 
remaining tertiary complex is then exposed to the light 
of the appropriate wavelength the fluorescence observed 

10 indicates the presence of the hapten of interest. 

Immunof luorescene and EIA techniques are both very well 
established in the art and are particularly preferred 
for the present method. However, other reporter 
molecules, such as radioisotope, chemiluminescent or 

15 bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays 
such as involving PGR analysis to detect the NR6 gene or 
its derivatives. Alternative methods or methods used in 
20 conjunction include direct nucleotide sequencing or 

mutation scanning such as single stranded conformational 
polymorphisms analysis (SSCP) as specific 

oligonucleotide hybridisation, as methods such as direct 
protein truncation tests. 

25 

The nucleic acid molecules of the present invention may 
be DNA or RNA. When the nucleic acid molecule is in a 
DNA form, it may be genomic DNA or cDNA. RNA forms of 
the nucleic acid molecules of the present invention are 
3 0 generally mRNA. 

Although the nucleic acid molecules of the present 
invention are generally in isolated form, they may be 
integrated into or ligated to or otherwise fused or 
35 associated with other genetic molecules such as vector 

molecules and in particular expression vector molecules. 
Vectors and expression vectors are generally capable of 
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replication and, if applicable, expression in one or 
both of a prokaryotic cell or a eukaryotic cell. 
Preferably, prokaryotic cells include E. coli. Bacillus 
sp and Pseudomonas sp. Preferred eukaryotic cells 
5 include yeast, fungal, mammalian and insect cells. 

Accordingly, another aspect of the present invention 
contemplates a genetic construct comprising a vector 
portion and a mammalian and more particularly a human 
10 NR6 gene portion, which NR6 gene portion is capable of 
encoding an NR6 polypeptide or a functional or 
immunologically interactive derivative thereof. 

Preferably, the NR6 gene portion of the genetic 
15 construct is operably linked to a promoter on the vector 
such that said promoter is capable of directing 
expression of said NR6 gene portion in an appropriate 
cell. 

20 In addition, the NR6 gene portion of the genetic 

construct may comprise all or part of the gene fused to 
another genetic sequence such as a nucleotide sequence 
encoding maltose binding protein or glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs 
and to prokaryotic or eukaryotic cells comprising same. 

The present invention also extends to any or all 
3 0 derivatives of NR6 including mutants, part, fragments, 
portions, homologues and analogues or their encoding 
genetic sequence including single or multiple nucleotide 
or amino acid substitutions, additions and/or deletions 
to the naturally occurring nucleotide or amino acid 
3 5 secpjence . 

NR6 may be important for the proliferation, 
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differentiation and survival of a diverse array of cell 
types. Accordingly, it is proposed that NR6 or its 
functional derivatives be used to regulate development, 
maintenance or regeneration in an array of different 
5 cells and tissues in vitro and in vivo. For example, 

NR6 is contemplated to be useful in modulating neuronal 
proliferation, dif f erentation and survival. 

Soluble NR6 polypeptides are also contemplated to be 
10 useful in the treatment of a range of diseases, injuries 
or abnormalities. 

Membrane bound or soluble NR6 may be used in vitro on 
nerve cells or tissues to modulate proliferation, 
15 differentiation or survival, for example, in grafting 
procedures or transplantation. 

As stated above, the NR6 of the present invention or its 
functional derivatives may be provided in a 

20 pharmaceutical composition comprising the NR6 together 
with one or more pharmaceutically acceptable carriers 
and/or diluents. In addition, the present invention 
contemplates a method of treatment comprising the 
administration of an effective amount of a NR6 of the 

25 present invention. The present invention also extends 
to antagonists and agonists of NR6s and their use in 
therapeutic compositions and methodologies. 

A further aspect of the present invention contemplates 
30 the use of NR6 or its functional derivatives in the 
manufacture of a medicament for the treatment of NR6 
mediated conditions defective or deficient. 

Still a further aspect of the present invention 
35 contemplates a ligand for NR6 preferably, in isolated or 
recombinant form or a derivative of said ligand. 
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The present invention further contemplates knockout 
animals such as mice or other murine species for the NR6 
gene including homozygous and heterozygous knockout 
animals. Such animals provide a particularly useful 
5 live in vivo model for studying the effects of NR6 as 
well as screening for agents capable of acting as 
agonists or antagonists of NR6 . 

According to this embodiment there is provided a 
10 transgenic animal comprising a mutation in at least one 
allele of the gene encoding NR6 . Additionally, the 
present invention provides a transgenic animal 
comprising a mutation in two alleles of the gene 
encoding NR6 . Preferably, the transgenic animal is a 
15 murine animal such as a mouse or rat. 

The present invention is further described by the 
following non- limiting Figures and Examples. 

20 In the Figures: 

Figure 1 is a diagrammatic representation showing 
expansion of sequenced region of the mouse NR6 gene 
indicating splicing patterns seen in the three forms of 
25 NR6 cDNA, NRG . 1 , NR6 . 2 and NR6 . 3 . 

Figure 2 is a representation of the nucleotide sequence 
of the mouse NR6 gene, containing exons encoding the 
cDNA from nucleotide 148 encoding D50 of the cDNAs shown 
30 in SEQ ID NOs:12 and 14 to the end of the 3N 

untranslated region shared by both NR6 . 1 , NR6 . 2 and 
NR6.3. In this figure, this region encompasses 
nucleotides gll82 to g6617 . This sequence is also 
defined in SEQ ID NO: 28. 

35 

Figure 3 is a representation of the nucleotide sequence 
of the mouse genomic NRG gene with additional 5N 
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10 



sequences. The coding exons of NR6 span approximately 
llkb of the mouse genome, 
separated by 8 introns: 



exonl at least 23 9nt 

exon 2 282nt 

exon3 13 0nt 

exon4 170nt 

exonS ISBnt 

exonS 169nt 

exonS IBBnt 

exonS 4 3nt 

exon9 252nt 



There are 9 coding exons 

intronl 5195nt 

intron2 214nt 

intron3 107nt 

intron4 1372nt 

intronS 68nt 

introne 2020nt 

intron7 I04nt 

intronS 181nt 



Exon 1 encoding the signal sequence, exon 2 the Ig-like 
15 domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
and 9 are alternatively spliced. 

Figure 4 is a diagrammatic representation showing the 
genomic structure of murine NR-6. 

20 

Figure 5 is a diagrammatic representation showing 
target ting of the NR6 locus by homologous recombination. 
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Single and three letter abbreviations for amino acid 
residues used in the specification are summarised in 
Table 2: 

5 TABLE 2 

Amino Acid Three -letter One -letter 

Abbreviation Symbol 



A 1 An i 


Ala 


A 


A T" CI i nine 


Arg 


R 




Asn 


N 


TV ^ ^ ^ a 1 

Asparuxc o-civj 


Asp 


D 


Cy s ce xne 


Cvs 


C 


T 111" St m T 


Gin 


Q 


CjXUuamj.C a.(-j.ia 


Glu 


E 


Gxycine 


Glv 


G 


Histiame 


His 


H 


Isoleucine 


He 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 


Any residue 


Xaa 


X 
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TABLE 3 
SUMMARY OF SEQ ID NO. 

Sequence SEQ ID NO. 

5 Amino acid sequence WSXWS 1 
Oligonucleotide primers and probes listed 

in Example l 2-11 

Nucleotide sequence of NR6.1^ 12 

Amino acid sequence of NR6 . 1 13 

10 Nucleotide sequence of NR6.2^ 14 

Amino acid sequence of NR6 , 2 15 

Nucleotide sequence of NR6.3^ 16 

Amino acid sequence of NRG . 3 17 
Nucleotide sequence of products generated 
15 by 5N RACE of brain cDNA using NRG 

specific primers'' 18 

Amino acid sequence of SEQ ID NO: 18 19 
Nucleotide sequence unique to 5N RACE of 

brain cDNA 2 0 

2 0 Amino acid sequence for SEQ ID NO: 20 21 

Unspliced murine NR6 nucleotide sequence 22 

PCR product for human NR6 2 3 
Nucleotide sequence of clone HFK- 66 

encoding human NRG 24 

2 5 Amino acid sequence of SEQ ID NO: 24 2 5 

Oligonucleotide sequences UPl and LPl, 

respectively 26-27 

Genomic nucleotide sequence of murine NR6 2 8 

Amino acid sequence of SEQ ID NO: 28 29 

3 0 Murine NRG . 1 oligonucleotide primers 30, 31 

Murine XL- 3 signal sequence 32 
Linker sequence for mouse IL-3 signal 

sequence and FLAG epitope 33-35 
Genomic nucleotide sequence of murine NR6 

35 containing addi tonal 5N sequence 38 

Oligonucleotide 2199 and 2200, respectively 36, 37 

N-teirminal region of NRG 3 9 
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'The polyadenylation signal AATAAATAAA is at nucleotide 
position 1451 to 1460; NR6 . 1 (SEQ ID NO: 12) and NR6 . 2 
(SEQ ID NO: 14) are identical to nucleotide 1223 encoding 
Q4 07, the represents the end of an exon. NR6 . 1 splices 
5 out an exon present only in NR6 . 2 and uses a different 
reading frame for the final exon which is shared with 
NR6.2; this corresponds to amino acids VLPAKL at amino 
acid residue positions 408-413. The region of 3N- 
untranslated DNA shared by NR6 . 1 , NR6 . 2 and NR6.3 is 
10 from nucleotide 1240 to 1475. The WSXWS motif is at 
amino acid residues 330 to 334. 

^The polyadenylation signal AATAAA is at nucleotide 
positions 1494 to 1503. The WSXWS motif is at amino 

15 acid residues 330 to 334. NR6 . 1 and NR6 . 2 are identical 
to nucleotide 1223 encoding Q407 which represents the 
end of an exon. NR6 . 2 splices in an exon beginning at 
amino acid residue D408, nucleotide 1224 and ends at 
residue G422, nucleotide 1264. The region of 3N 

20 untranslated DNA shared by NR6 . 1 , NR6 . 2 and NR6 . 3 is 
from nucleotide position 1283 to 1517. 

'The nucleotide and amino acid numbering corresponds to 
SEQ ID NO: 12 and 14. The WSXWS motif is at amino acid 

25 residues 330 to 334. The polyadenylation signal 

AATAAATAAA is from nucleotide 1781 to 1780. NR6 . 1 , 
NR6.2 and NR6 . 3 are identical to nucleotide 1223 
encoding Q407, this represents the end of an exon. 
NR6.3 fails to splice from this position and, therefore, 

30 translation continues through the intron, giving rise to 
the C-terminal protein region from amino acid residues 
408 to 461. The region of 3N untranslated DNA shared by 
NR6.1, NR6.2 and NR6 . 3 is from nucleotide 1469 to 1804. 

35 *The nucleotide sequence is identical to NR6 . 1 , NR6 . 2 

and NR6.3 from nucleotide C151, the first nucleotide for 
Pro51 . The numbering from this nucleotide is the same 
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as for SEQ ID NO: 14 and 16. The SN of this point is 
unique to the products generated by 5N RACE not being 
found in NR6,1, NR6 . 2 and NR6 . 3 and is represented in 
SEQ ID NOs:20 and 21. 

5 

^Structure of the murine genomic NR6 locus . The coding 
exons of NR6 span approximately llkb of the mouse 
genome. There are 9 coding exons separated by 8 
introns : 

10 



exon 


1 


at least 239nt 


intronl 


5195nt 


exon 


2 


282nt 


intron2 


214nt 


exon 


3 


13 0nt 


intron3 


107nt 


exon 


4 


170nt 


intron 4 


1372nt 


exon 


5 


158nt 


intronS 


68nt 


exon 


6 


16 9nt 


intron6 


2020nt 


exon 


7 


188nt 


intron? 


104nt 


exon 


S 


43nt 


intronS 


laint 


exon 


9 


252nt 







20 

Exon 1 encodes the signal sequence, exon 2 the Ig-like 
domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
and 9 are alternatively spliced. 

2 5 The NRG molecules of the present invention have a range of 

utilities referred to in the subject specification. 
Additional utilities include: 

1. Identification of molecules that interact with NR6 . 

3 0 These may include : 

a) a corresponding ligand using standard orphan receptor 
techniques (2 6) , 

3 5 b) monoclonal antibodies that act either as receptors 
antagonists or agonists, 
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c) mimetic or antagonistic peptides isolated using phage 
display technology (27,28), 

d) small molecule natural products that act either as 
5 antagonists or agonists. 

2. Development of diagnostics to detect 
deletions/rearrangements in the NR6 gene. 

The NR6 knock-out mice studies described herein provide a 
10 useful model for this utility. There are also applications 
in the field of reproduction. For example, people can be 
tested for their NR6 status. NR6 +/- carriers might be 
expected to give rise to offspring with developmental 
problems . 
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10 



15 



20 



25 



30 



35 
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EXAMPLE 1 
Oligonucleotides 



M116 : 


5 • 


ACTCGCTCCAGATTCCCGCCTTTT 3* [SEQ ID NO: 2] 




M108 : 


5 ' 


TCCCGCCTTTTTCGACCCATAGAT 3' (SEQ ID NO : 3 ] 




M159 : 


5 ' 


GGTACTTGGCTTGGAAGAGGAAAT 3* [SEQ ID NO: 4] 




M242 : 


5 • 


CGGCTCACGTGCACGTCGGGTGGG 3' [SEQ ID NO: 5] 




M112 : 


5 * 


AGCTGCTGTTAAACjGLjLliL.lv- J lo-by iU NU:5J 




WSDWS 


5 ' 


(A/G)CTCCA(A/G)TC(A/G)CTCCA 3' (SEQ ID NO: 


7] 


WSEWS 


5 ' 


(A/G) CTCCA(C/T) TC (A/G) CTCCA 3' [SEQ ID NO: 


8] 


1944 


5 • 


AAGTGTGACCATCATGTGGAC 3' [SEQ ID NO: 9) 




2106 


5 ' 


GGAGGTGTTAAGGAGGCG 3' [SEQ ID NO: 10] 




2120 


5 * 


ATGCCCGCGGGTCGCCCG 3* [SEQ ID NO: 11] 








EXAMPLE 2 




Isolation 


of 


initial NR6 cDNA clones using 





oligonucleotides designed against the conserved WSXWS 
motif foxxnd in members of the haemopoietin receptor 
family 

(i) A commercial adult mouse testis cDNA library cloned 
into the UNI -ZAP bacteriophage (Stratagene, CA, USA; 
Catalogue numbers 937 308) was used to infect 
Escherichia coli of the strain LE392. Infected bacteria 
were grown on twenty 150 mm agar plates, to give 
approximately 50,000 plaques per plate. Plaques were 
then transferred to duplicate 150 mm diameter nylon 
membranes (Colony/ Plaque Screen, NEN Research Products, 
MA, USA) , bacteria were lysed and the DNA was denatured 
and fixed by autoclaving at lOO^C for 1 min with dry 
exhaust. The filters were rinsed twice in 0 . 1% (w/v) 
sodium dodecyl sulfate (SDS) , 0.1 x SSC (SSC is 150 mM 
sodium chloride, 15 mM sodium citrate dihydrate) at room 
temperature and pre-hybridized overnight at 42*^C in 6 x 
SSC containing 2 mg/ml bovine serum albumin, 2 mg/ml 
Ficoll, 2 mg/ml polyvinylpyrrolidone, 100 mM ATP, 10 
mg/ml tRNA, 2 mM sodium pyrophosphate, 2 mg/ml salmon 
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sperm DNA, 0.1% (w/v) SDS and 200 mg/ml sodium azide. 
The pre -hybridisation buffer was removed. 1.2 Fg of the 
degenerate oligonucleotides for hybridization (WSDWS; 
Example 1) were phosphorylated with T4 polynucleotide 
5 kinase using 960 mCi of y32p-ATP (Bresatec, S.A.. 

Australia) . Unincorporated ATP was separated from the 
labelled oligonucleotide using a pre-packed gel 
filtration column (NAP-5; Pharmacia, Uppsala, Sweden). 
Filters were hybridized overnight at 420C in 80 ml of 
10 the prehybridisation buffer containing 0.1% (w/v) SDS, 
rather than NP40, and 10^ - lO^ cpm/ml of labelled 
oligonucleotide. Filters were briefly rinsed twice at 
room temperature in 6 x SSC, 0.1% (v/v) SDS, twice for 30 
min at 4 5°C in a shaking waterbath containing 1.5 1 of 
15 the same buffer and then briefly in 6 x SSC at room 

temperature. Filters were then blotted dry and exposed 
to autoradiographic film at -70OC using intensifying 
screens, for 7 - 14 days prior to development. 
Plaques that appeared positive on orientated duplicate 
20 filters were picked, eluted in 1 ml of 100 mM NaCl , 10 
mM MgCl2, 10 mM Tris.HCl pH7 . 4 containing 0.5% (w/v) 
gelatin and 0.5% (v/v) chloroform and stored at 4OC. 
After 2 days LE3 92 cells were infected with the eluate 
from the primary plugs and replated for the secondary 
25 screen. This process was repeated until hybridizing 
plaques were pure. 

Once purified, positive cDNAs were excised from the ZAP 
II bacteriophage according to the manufacturer's 

30 instructions (Stratagene, CA, USA) and cloned into the 

plasmid pBluescript. A CsCl purified preparation of the 
DNA was made and this was sequenced on both strands. 
Sequencing was performed using an Applied Biosyscems 
automated DNA sequencer, with fluorescent 

3 5 dideoxynucleotide analogues according to the 

manufacturer's instructions. The DNA sequence was 
analysed using software supplied by Applied Biosystems . 
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Two clones isolated from Che mouse testis cDNA library 
shared large regions of nucleotide sequence identity 68- 
1 and 68-2 and appeared to encode a novel member of the 
haemopoietin receptor family and the inventors gave the 
5 putative receptor the working name "NR6". 

(ii) In a parallel series of experiments, a commercial 
mouse brain cDNA library (STRATAGENE #967319, Balb/c 
day- 20, whole brain cDNA/Uni-ZAP XR Vector) was used to 

10 infect E.coii strain XLl-Blue MRF= . Infected bacteria 
were grown on 90x13 5mm square agar plates to give about 
25,000 plaques per plate. Plaques were then transferred 
to positively charged nylon membranes, Hybond-N(+) 
(Amersham RPN 203B) , bacteria were lysed and the DNA was 

15 denatured with denaturing 0.5 M NaOH, 1.5 M NaCl at room 
temperature for 7 min. The membranes were neutralized 
with 0.5 M Tris-HCL pH7 . 2 , 1.5 M NaCl, 1 mM EDTA at room 
temperature for 10 min before the DNA fixation by UV 
crosslinking . 

20 

A mixture of WSDWS and WSEWS oligonucleotide probes (SEQ 
ID NOs: 7 and 8) were labelled with a ["--^^PJ-ATP 
(TOYOBO #PNK-104 Kination kit) . The membranes from the 
mouse brain cDNA library were then hybridized with the 

2 5 mixture of WSDWS and WSEWS oligonucleotide probes in the 
Rapid Hybridization Buffer (Amersham, RPN1636) at 42^C 
for 16 hours. Filters were washed with lxSSC/0.1% (w/v) 
SDS at 42^C before autoradiography. Plaques that 
appeared positive on orientated duplicate filters were 

30 picked and replated on E. coli, XLl-Blue MRFN with the 
process of immobilisation on nylon membranes, 
hybridization of membranes with oligonucleotide probes, 
washing and autoradiography repeated until pure plaques 
had been obtained. 

35 

The cDNA fragment from pure positively hybridizing 
plaques was isolated by excision with the helper phage 
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Strain ExAssist according to the manufacturer's 
instructions (Stratagene, #967319). Sequencing was 
performed after the amplification with Ampli-Taq DNA 
polymerase and Taq dideoxy terminator cycle sequencing 
5 kit (Perkin Elmer, #401150) by 25 cycles of 96^0 for 10 
sec, 50OC for 5 sec, 60°C for 4 min followed by 60^0 for 
5 min with the sequencing primers on an ABI model 377 
DNA sequencer. 

10 One clone, MBC-8, from the mouse brain library shared 

large regions of nucleotide sequence identity with both 
the 68-1 and 68-2 clones isolated from the mouse testis 
cDNA library. 

15 (iii) In a third series of experiments, total RNA was 

prepared from the mouse osteoblastic cell line, KUSA, 
according to the method of Chirgwin et al . (15), and 
poly(A)+RNA was further purified by oligo (dT) -cellulose 
chromatography (Pharmacia Biotech) . Complementary DNA 

2 0 was synthesized by oligo (dT) priming, inserted into the 
UniZAP XR directional cloning vector (Stratagene) , and 
packaged into 8 phage using Gigapack Gold (Stratagene) , 
yielding 1.25 x 10*^ independent clones. 

2 5 Approximately 10^ clones were screened essentially as 
described in (ii) above. Briefly, probes were labeled 
with ^^P using T4 polynucleotide kinase and 
prehybridization was performed for 4 hr in the Rapid 
hybridization buffer (Amersham LIFE SCIENCE) at 420C. 

30 Filters (Hybond N+ , Amersham) were then hybridized for 

19 hr under the same condition with the addition of -^^P- 
labeled WSXWS mix oligonucleotides and washed 3 times. 
The final wash was for 30 min in 1 x SSPE, 0.1% (w/v) 
SDS at 42°C. Filters were then exposed with an 

35 intensifying screen to Kodak X-OMAT AR film for 5 days. 

Isolated clones were subjected to the in vivo excision 
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of pBluescript SK(-) phagemid (Stratagene) , and plasmid 
DNA was prepared by the standard method. DNA sequences 
were determined using an ABI PRISM 377 DNA Sequencer 
{Perkin Elmer) with appropriate synthetic 
5 oligonucleotide primers. A clone pKUSA166 shared large 
regions of nucleotide sequence identity with the MBC-8, 
68-1 and 68-2 clones isolated from the mouse brain and 
testis cDNA libraries. 

10 EXAMPLE 3 

Isolation of further NR6 cDNA clones using probes 
specific for NR6 

(i> In order to identify other cDNA libraries 

15 containing cDNA clones for NR6 , the inventors performed 
PGR upon 1 Ml aliquots of A-bacteriophage cDNA libraries 
made from mRNA from various human tissues and using 
oligonucleotides 2070 and 2057, designed from the 
sequence of 68-1 and 68-2, as primers. Reactions 

20 contained 5 /il ot 10 x concentrated PGR buffer 

(Boehringer Mannheim GmbH, Mannheim, Germany) , 1 fil' of 
10 mM dATP, dGTP, dGTP and dTTP, 2 . 5 m1 of the 
oligonucleotides HyB2 and either T3 or T7 at a 
concentration of 100 mg/ml, 0.5 m1 of Taq polymerase 

2 5 (Boehringer Mannheim GmbH) and water to a final volume 

of 50 Ml- carried out in a Perkin-Elmer 9600 by 

heating the reactions to 96^0 for 2 min and then for 25 
cycles at 960c for 30 sec, 550c for 30 sec and 720C for 
2 min. PGR products were resolved on an agarose gel, 

30 immobilized on a nylon membrane and hybridized with 32p. 
labelled oligonucleotide 1943 (SEQ ID NO: 42) . 

In addition to the original library, a mouse brain cDNA 
library appeared to contain NR6 cDNAs. These were 
35 screened using a ^^p. labelled oligonucleotides 1944, 
2106, 2120 {Example 1) or with a fragment of the 
original NR6 cDNA clone from 68-1 (nucleotide 934 to the 
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end of NR6.1 in Figure 1) labelled with using a 

random decanucleotide labelling kit (Bresatec) . 
Conditions used were similar to those described in (i) 
above except that for the labelled oligonucleotides, 
5 filters were washed at BB^C rather than 4 5^0, while for 
the NR6 cDNA fragment prehybridizat ion and hybridization 
was carried out in 2xSSC and filters were washed at 0.2 
X SSC at 65^0. Again, as described in (i) above, 
positively hybridising plaques were purified, the cDNAs 
10 were recovered and cloned into plasmids pBluescript 11 
or pUC19. Independent cDNA clones were sequenced on 
both strands . 

Using this procedure, 6 further clones, 68-5, 68-35, 68- 
15 41, 68-51, 68-77 and 73-23, contained large regions of 

sequence identity with 68-1, 68-2, MBC-8 and pKUSA166. 

In a parallel series of experiments, further screening 

was performed with hybridization probes prepared from 

20 the 1.7 k±>p EcoRI-XhoI fragment excised from pKUSA166. 

3 2 

This fragment was excised and labeled with P by using 
T7QuickPrime Kit (Pharmacia Biotech) . Approximately 
6x10^ clones were screened. Hybond N+ filters 
(Amersham) were first prehybridized for 4hr at 42^0. in 

25 50% (v/v) formamide, SxSSPE, 5xDenhardt ' s solution, 0.1% 

(w/v) SDS, and 0 . Img/ml denatured salmon sperm DNA. 
Hybridization was for 16 hours under the same conditions 
with the addition of ^^P- labelled NR6- cDNA fragment 
probes. Finally the filters were washed once for Ihr in 

30 0.2XSSC, 0.1% (w/v) SDS at 6B^C. Eight clones were 

isolated, and phage clones were subjected to the in vivo 
excision of the pBluescript SK(-) phagemid (Stratagene) . 
The plasmid DNAs were prepared by the standard method. 
DNA sequences were determined by an ABI PRISM 3 77 DNA 

35 Sequencer using appropriate synthetic oligonucleotide 
primers . 
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Using this procedure 8 further clones from the KUSA 
library contained large regions of sequence identity 
with 68-1, 68-2, MBC-8, pKUSA166, 68-5, 68-35, 68-41, 
68-51, 68-77 and 73-23 were isolated. 

EXAMPLE 4 
Isolation of genomic DNA encoding NR6 



DNA encoding the murine NR6 genomic locus was also 

10 isolated using the 68-1 cDNA as a probe. Two positive 
clones, 2-2 and 57-3, were isolated from a mouse 129/Sv 
strain genomic DNA library cloned into A FIX. These 
clones were overlapping and the position of the 
restriction sites, introns and exons were determined in 

15 the conventional manner. The region of the genomic 

clones containing exons and the intervening introns were 
sequenced on both strands using an Applied Biosystems 
automated DNA sequencer, with fluorescent 
dideoxynucleotide analogues according to the 

20 manufacturer's instructions. Figure 2 shows the 
nucleotide sequence and corresponding amino acid 
sequence of the translation regions. This is also shown 
in SEQ ID NOs:30 and 31. Figure 3 provides the genomic 
NR6 gene sequence but with additional 5N sequence. This 

25 is also represented in SEQ ID NO: 38 in relation to this 
sequence. The coding exons of NR6 span approximately 
llkb of the mouse genome. There are 9 coding exons 
separated by 8 introns : 



30 



35 



exonl 


at least 239nt 


intronl 


5195nt 


exon2 


282nt 


intron2 


214nt 


exon3 


13 0nt 


intron3 


107nt 


exon4 


17 0nt 


intron4 


1372nt 


exon5 


158nt 


intronS 


68nt 


exon6 


169nt 


intron6 


2020nt 


exon7 


188nt 


intron7 


104nt 


exonS 


4 3nt 


introns 


181nt 
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exon9 252nt: 

Exon 1 encodes the signal sequence, exon 2 the Ig-like 
domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
5 and 9 are alternatively spliced. 



EXAMPLE 5 

10 SN RACE analysis of NR6 

5 ' -RACE was used to investigate the nature of the 
sequence 5' of nucleotide 960, encoding Ile321 of NR6 . 1 , 
2 and 3 . The nucleotide and corresponding amino acid 

15 sequences are shown in SEQ ID NOs : 12 , 14 and 16, 

respectively. 5 ' -RACE was performed using Advantage 
KlenTaq polymerase (clontech, cat no. K1905-1) on mouse 
brain Marathon-ready cDNA (clontech, cat no. 7450-1) 
according to the manufacturer's instructions. Briefly, 

20 the first rounds of amplification were performed using 
5^1 of cDNA in a total volume of 50;il, with ImM each of 
the primers AP1&M116 [SEQ ID NO: 2] or APlStM159 [SEQ ID 
NO:4] by 35 cycles of 94<^C x O.Smin, SB^C x 2 . Omin on 
GeneAmp 2400 ( Perkin-Elmer ) . An amount of 5/il of 50- 

25 fold diluted product from the first amplification was 
then re-amplified ; for the products generated with 
primers API and MUG [SEQ ID NO: 2] in the first 
amplification, 1 mM of the primers AP2£cM108 (SEQ ID 
NO: 3] were used in the second amplification. For the 

3 0 products generated with primers API and Mil 6 [SEQ ID 

NO:2] in the first amplification, two separate secondary 
reactions were performed, one reaction with 1 mM primers 
AP2&M242 [SEQ ID NO: 5] and the other with 1 mM primers 
AP2StM112 [SEQ ID NO:6]. Amplification was achieved 

35 using 25 cycles of 94^0 x O.Smin, 68^0 x 2. Omin. These 
samples were analyzed by agarose gel electrophoresis. 
When a single ethidium bromide staining amplification 
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product was observed, it was purified by QIAquick PGR 
purification kit according to the manufacturer's 
instructions (qiagen, cat no. DG-0281) and its sequence was 
directly determined using both primers used in the 
5 secondary amplification step, that is AP2 and either 

M108 [SEQ ID N0:3], M242 [SEQ ID N0:5] or M112 [SEQ ID 
NO: 6] , 

EXAMPLE 6 

IQ Cloning of NR6 

From the initial screens of mouse brain and testis cDNA 
libraries with the degenerate WSXWS oligonucleotides and 
subsequent screening of cDNA libraries from mouse 
15 testis, mouse brain and the KUSA osteoblastic cells line 
a total of 18 NR6 cDNAs have been isolated. Nucleotide 
sequence of NR6 was also determined from 5 ' RACE analysis 
of brain cDNA. Additionally, two murine genomic DNA 
clones encoding NR6 have also been isolated. 

20 

Comparison of the NR6 cDNA clones revealed a common 
region of nucleotide sequence which included a 123 base 
pairs 5 ' -untranslated region and 1221 base pairs open 
reading frame, stretching from the putative initiation 

2 5 methionine, Metl to Gln4 07 (SEQ ID NOs:12, 14 and 16, 

respectively) . Within this common open reading frame, a 
haemopoietin receptor domain was observed which 
contained the four conserved cysteine residues and the 
five amino acid motif WSXWS typical of members of the 

30 haemopoietin receptor family, was observed. 

Further analyses revealed that after nucleotide 1221, 
three different classes of NR6 cDNAs could be found, 
these were termed NRG . 1 , NR6 . 2 and NRG . 3 (SEQ ID NOs:12, 
35 14 and 16, respectively) . Each encoded a receptor that 
appeared to lack a classical transmembrane domain and, 
would, therefore be likely to be secreted into the 
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extracellular environment. Although the putative C- 
terminal region of the three classes of NR6 proteins 
appear to be different, the cDNAs encoding them also had 
a common region of 3 ' -untranslated region. 

5 

With regard to SEQ ID NOs:12, 14 and 16, the number of 
both nucleotides and amino acids begins at the putative 
initiation methione. NR6 , 1 and NR6 . 2 are identical to 
nucleotide 1223 encoding Q407, this represents the end 

10 of an exon. NR6 . 1 splices out an exon present only in 
NR6.2 and uses a different reading frame for the final 
exon which is shared with NR6.2. The 3N-untranslated 
region is shared by NR6 . 1 , NR6.2 and NRG . 3 , NRG, 2 
splices in an exon starting with nucleotide 1224 

15 encoding D408 and ending with nucleotide 1264 encoding 
the first nucleotide in the codon for G422 and uses a 
different reading frame for the final exon which is 
shared with NRG . 2 (see Figure 1) . NRG . 3 fails to splice 
from position nucleotide 1224, therefore, translation 

20 continues through the intron, giving rise to the C- 
terminal protein region. 

The sequence of NRG cDNA products generated by 5 ' -RACE 
amplification from mouse brain cDNA preparation is 

25 shown in SEQ ID NO: 18. The nucleotide sequence 

identified using 5* -RACE appeared to be identical to the 
sequence of cDNAs encoding NRG . 1 , NR6 . 2 , and NRG . 3 from 
nucleotide C151, the first nucleotide for the codon for 
ProSl, 5' of this nucleotide, the sequences diverged 

30 and the sequence is unique not being found in NRG , 1 , 
NRG. 2 or NRG . 3 , Additionally, there is a single 
nucleotide difference, with the sequence from the RACE 
containing an G rather than an A at nucleotide 475, 
resulting in Thrl59 becoming Ala. 



35 



Analysis of the genomic clones, revealed that they were 
overlapping and contained exons encoding the majority of 
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the coding region of the three forms of NR6 (Figures 1, 
2 and 3) - These genomic clones, contained exons 
encoding from Asp50 (nucleotide 148) of the NR6 cDNAs . 
Sequence 5' of this in the cDNAs, including the 5'- 
5 untranslated region and the region encoding Metl to 
Gln49 (SEQ ID NOs : 12 , 14 and 16), and the 5' end 
predicted from analysis of 5' RACE products (SEQ ID 
NO: 18) were not present in the two genomic clones 
isolated . 

10 

Analysis of the NR6 genomic DNA clones also provided an 
explanation of the three classes of NR6 cDNAs found. It 
is likely that NR6.1, NRG . 2 and NR6 . 3 arise through 
alternative splicing of NR6 mRNA (Figure 1) . The last 
15 amino acid residue that these different NR6 proteins are 
predicted to share is Gln407. SEQ ID NO: 18 shows that 
Gln407 is the last amino acid encoded by the exon that 
covers nucleotides g5850 to g6037 (see Figure 2) . 
Alternative splicing from the end of this exon (Figure 

2 0 1) 'accounts for the generation of cDNAs encoding NR6 . 1 

(SEQ ID NO:12), NR6 . 2 (SEQ ID N0:14) and NR6 . 3 (SEQ ID 
NO:16). In the case of NR6 . 1 , the region from g6038 to 
g6425 is spliced out, leading to juxtaposition of g6037 
and g6426. In the case of NR6 . 2 , the region from g6038 

25 to 6141 is spliced out, an exon from 6142 to g6183 is 
retained and then this is followed by splicing out of 
the region from g6183 to g6425. NR6 /3 appears to arise 
when there is no splicing from nucleotide g6038. For 
all three forms, a secreted rather then transmembrane 

30 form is generated, these differ however in their 

predicted C- terminal region. The genomic NRG sequence 
with additional 5N sequence is shown in Figure 3. 

EXAMPLE 7 

3 5 ESTs 

Databases were searched with the murine NR6 
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corresponding co the unspliced version shown in SEQ ID 
NO: 16. The murine NRG sequence used is shown in SEQ ID 
NO : 22 . 

The databases searched were : 

5 

(i) dbEST - Database of Expressed Sequence Tags 
National Center for Biotechnology Information National 
Library of Medicine, 38A, 8N8058600 Rockville Pike, 
Bethesda, MD 20894 Phone: 0011-1-301-496-2475 Fax: 

10 0015-1-301-480-9241 USA. 

(ii) DNA Data Bank of Japan DNA Database Release 3689. 
Prepared by: Sanzo Miyazawa Manager/Database 
Administrator HidenoriHayashida Scientific Reviewer 

15 Yukiko Yamazaki/Eriko Hatada/Hiroaki Serizawa 

Annotators/ reviewers Motono Horie/Shigeko Suzuki/Yumiko 
SataoSecretaries/typists DNA Data Bank of JapanNat ional 
Institute of Genetics Center for Genetic Information 
research Laboratory of Genetic Information Analyses 1111 
YataMishima, Shizuoka 411 Japan. 



20 



25 



30 



35 



(iii) EMBL Nucleic Acid Sequence Data Bank Release 
47.0. 

(iv) EMBL Nucleic Acid Sequence Data Bank Weekly Updates 
Since Release 44 . 

(V) Genetic Sequence Data Bank NCBI-GenBank Release 94 
National Center for Biotechnology Information National 
Library of Medicine, 38A, 8NB05 8600 Rockville Pike. 
Bethesda, MD 20894 Phone: 0011-1-3 01-495-2475 Fax: 
0015-1-301-480-9241 USA. 

(vi) Cumulative Updates since NCBI-GenBank Release 88 
National Center for Biotechnology Information National 
Library of Medicine, 38A, 8N805 8600 Rockville Pike, 
Bethesda, MD 2 08 94 USA. 
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The search of the databases with the murine probe 
identified several EST's having sequence similarity to 
the probe. The EST*s were: 

5 W66776 (murine sequence) 
MMS83 9 (murine sequence) 
AA014 965 (murine sequence) 
W4 6604 (human sequence) 
W4 6603 (human sequence) 
10 H14009 (human sequence) 
N788 73 (human sequence) 
R8 74 0 7 (human sequence) . 

EXAMPLE 8 

15 Isolation of 3N cDNA clones encoding human UR6 

PGR products encoding human NR6 were generated using 
oligonucleotides UPl and LPl (see below) based on human 
ESTs (Genbank Acc:H14 009, Genbank Acc : AA042 914) that 

20 were identified from databases searched with murine NR6 
sequence (SEQ ID NO: 22) . PGR was performed on a human 
fetal liver cDNA library {Marathon ready cDNA CLONTECH 
#7403-1) using Advantage Klen Taq Polymerase mix 
(CLONTECH #8417-1) in the buffer supplied at 941G fro 

25 30s and 681C for 3 min for 35 cycles followed by 681C 
for 4 min and then stopping at 15 IC. A standard PGR 
programme for the Perkin-Elmer GeneAmp PCT system 2400 
thermal cycle was used. The PGR yielded a prominent 
product of approximately 56 0 base pairs (bp; SEQ ID 

30 NO:18), which was radiolabelled with ["-^^P] dCTP using a 
random priming method (Amersham, RPN, 1607, Mega prime 
kit) and used to screen a human fetal kidney 5N- STRETCH 
PLUS cDNA library (CLONTECH #HL1150x) . Library screens 
were performed using Rapid Hybridisation Buffer 

35 (Amersham, RPn 1636) according to manufacturer's 

instructions and membranes washed at 651C for 3 0 min in 
O.lxSSC/0.1% (w/v) SDS. Two independent cDNA clones 
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were obtained as lambda phage and subsequently subcloned 
and sequenced. Both clones (HFK-63 and HFK-66) 
contained 1.4 kilobase ()cb) inserts that showed sequence 
similarity with murine NR6 . The sequence and 
corresponding amino acid translation of HFK-66 is shown 
in SEQ ID NO:24 . 

The translation protein sequences of clone HFK-66 shows 
a high degree of sequence similarity with the mouse bTR6 . 

OLIGONUCLEOTIDES 

UPl: 5NTCC AGG CAG CGG TCG GGG GAC AAC 3N [SEQ ID NO: 261 
LPl: 5N TTG CTC ACA TCG TCC ACC ACC TTC 3N [SEQ ID 
NO: 27] 

EXAMPLE 9 
Genomic Structure of Human NR6 

Human genomic DNA clones encoding human NR6 was 
isoloated by screening a human genomic library (Lambda 
FIXJII Stratagene 946203) with radiolabelled 
oligonucleotides. 2199 and 2200 (see below) . These 
oligonucleotides were designed based on human ESTs 
(Genbank Acc:R87407, Genbank Acc:H14009) that were 
identified from databases searched with murine NR6 . 
Filters were hybridised overnight at 371C in 6xSSC 
containing 2 mg/ml bovine serum albumin, 2 mg/ml Ficoll, 
2mg/ml polyvinylpyrrolidone, 100 mM ATP, 10 mg/ml tRNA, 
2 mM sodium pyrophosphate, 2 mg/ml salmon sperm DNA, 
0.1% (w/v) SDS and 200 mg/ml sodium azide and washed at 
651C in 6 X SSC/0.1% SDS. Five independent genomic 
clones were obtained and sequenced. The extend of 
sequence obtained has determined that the clones overlap 
and exhibit a similar genomic structure to murine NR6 . 
Exon coding regions are almost identical over the region 
covered by the genomic clones while intron coding 
regions differ, although the size of the introns are 
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PCT/GB97/02479 
The extent of known overlap is shown in 



OLIGONUCLEOTIDES : 

5 

2199: 5N CCC ACG CTT CTC ATC GGA TTC TCC CTG 3N [SEQ ID 
NO:36] 

2200: 5N CAG TCC ACA CTG TCC TCC ACT CGG TAG 3N [SEQ ID 
NO:37] 

10 

EXAMPLE 10 

NortheTO Blot Analysis of Hiiman N3^6 mRNA Expression 

15 Clontech Multiple Tissue Northern Blots (Human MTN Blot, 
CLONTECH #7760-1, Human MTN Blot IV, CLONTECH #7766-1, 
Human Brain MTN Blot II, CLONTECH #7755-1, Human Brain 
MTN Blot III, CLONTECH #7750) were probed with a 
radiolabelled 3N human NR6 cDNA clone, HFK- 66 (SEQ ID 

20 NO:24) . The clone was labelled with ["-^^P] dCTP using a 
random priming method (Amersham, RPN 1607, Mega prime 
kit) . Hybridisation was performed in Express 
Hybridisation Solution (CLONTECH H50910) for 3 hours at 
671C and membranes were washed in O.lxSSC/0.1% w/v SOS 

25 at 501C. 

A 1.8 kb transcript was detected in a variety of human 
tissues encompassing reproductive, digestive and neural 
tissues. High levels were observed in the heart, 

3 0 placenta, skeletal muscle, prostate and various areas of 
the brain, lower levels were observed in the testis, 
uterus, small intestine and colon. Photographs showing 
these Northern blots are available upon request. This 
expression pattern differs from the expression pattern 

3 5 observed with murine NR6 . 

EXAMPLE 11 
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Mouse NR6 Expression Vectors 

pEF-FIaAG/mNRS . 1 

5 The mature coding region of mouse NR6 . 1 was amplified 

using the PGR to introduce an in- frame Asc I restriction 
enzyme site at the 5' end of the mature coding region 
and an Mlu I site at the 3' end, using the following 
oligonucleotides : - 

10 

oliao 5N-AGCTGGCGCGCCTCCCGGGCGGATCGGGAGCCCAC-3N [SEQ 
ID NO: 30] 

oliao 5N-AGCTACGCGTTTAGAGTTTAGCCGGCAG-3N [SEQ ID 
NO : 3 1 ] 

15 

The resulting PGR derived DNA fragment was then digested 
with Asc I and Mlu I and cloned into the Mlu I site of 
pEF-FLAG. Expression of NR6 is under the control of the 
polypeptide chain elongation factor la promoter as 
20 described (16) and results in the secretion, using the 
IL3 signal sequence from pEF-FLAG, of N- terminal FLAG- 
tagged NR6 protein. 

pEF-FIiAG was generated by modifying the expression 
25 vector pEF-BOS as follows :- 

pEF-BOS (16) was digested with Xba I and a linker was 
synthesized that encoded the mouse IL3 signal sequence 
(MVLASSTTSIHTMLLLLLMLFHLGLQASIS) and the FLAG epitope 
30 (DYKDDDDK) . Asc I and Mlu I restriction enzyme sites 

were also introduced as cloning sites. The sequence of 
the linker is as follows:- 

MVLASSTTSIHT 

35 M 

CTAGAGTAGTGCTGACACAATGGTTCTTGGCAGCTCTACCAGCAGCATGCACACCA 
TG 
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TGATCACGACTGTGTTACCAAGAACGGTCGAGATGGTGGTCGTAGGTGTGGTAC 
LLLLLMLFHLGLQAS IS Asc 

5 I 

CTGCTCCTGCTCCTGATGCTCTTCCACCTGGGACTCCAAGCTTCAATCTCGGCGCG 

cc 

GACGAGGACGAGGACTAGCAGAAGGTGGACCCTGAGGTTCGAAGTTAGAGCCGCGC 
GG 

10 

DYKDDDDK Mlu I 
AGGACTACAAGGACGACGATGACAAGACGCGTGCTAGCACTAGT 

TCCTGATGTTCCTGCTGCTACTGTTCTGCGCACGATCGTGATCAGATC 

15 

The two oligonucleotides were annealed together and 
ligated into the Xba I site of pEF-BOS to give pEF-FLAG. 

pC0Sl/PLAG/inNR6 & pCH01/FIiAG/inNR6 

20 

A DNA fragment containing the sequences encoding IL3 
signal sequence/Flag/mNR6 and the poly (A) adenylation 
signal from human G-CSF cDNA, was excised from pEF- 
FLAG/mNR6 using the restriction enzyme EcoR I. This DNA 
25 fragment was then inserted into the EcoR I cloning site 
of pCOSl and pCHOl 

The pCOSI and pCHOl vectors were constructed as follows. 
pCHOl is also described in reference (17) but with a 
30 different selectable marker. 

pCOSl was prepared by digesting HEF-12h-g"l (see Figure 
24 of International Patent Publication No. WO 92/19759) 
with Ecoi?I and Smal and ligating the digesting product 
35 iwht an EcoRI -NotI -BainHI adaptor (Takara 4510) . The 

resulting plasmid comprises an EFI" promoter/ enhancer , 
Nco'^ marker gene, SV40E, ori and an Amp^ marker gene. 
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pCHOl was constructed by digesting DHFR-PMh-grl (see 
Figure 25 of International Patent Publication No. WO 
92/19759) with Pvul and Eco47III and ligating same with 
pCOSI digested with Pvul and £co47III. The resulting 
5 vector, pCHOl, comprises an EFI" promoter/ enhancer , an 
DHFR marker gene, SV4 0E, Ori and a Amp* gene. 

EXAMPLE 12 

10 

mRN6 has been expressed as an NN Flag tagged protein 
following transfection of CHO cells and as a CN Flag 
tagged protein following transfection of KUSA cells in 
both cases varying levels of dimeric and aggregated NRG 
15 were secreted. 

EXANPIiE 13 
Murine NR6 expression 

20 

NR6 expression studies were conducted in murine Northern 
Blots. At the level of sensitivity used in the adult 
mouse, NR6 expression was detected in salivary gland, 
lung and testis. During erT±>ryonic development, NR6 is 

25 expressed in fetal tissues from day 10 of gestation 
through to birth. In cell lines, NR6 expression has 
been observed in the T- lymphoid line CTLL-2 as well as 
in FD-PyMT (FDC-Pl myeloid cells expressing polyoma 
midle T gene) , and f ibroblastoid cells including bone 

30 marrow and fetal liver stromal lines. 

EXAMPLE 14 

Expression, purification and characterisation of CHO and 
KUSA znNR6 

35 

The methods provide for the production of a dimeric form 
of CHO derived NN FLAG-mNR6 without refolding. All 
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Other methods are capable of producing NRG and are 
encompassed by the present invention. 

A. Production of CHO derived N* FLAG-mNRS (dimeric 
5 form) 

(i) Protein Production 

To analyse structure and functional activity, a cDNA 
fragment containing the entire coding sequence of murine 

10 NR6 with an N- terminal FLAG (NN FLAG) sequence was 
cloned into the EcoRl site of the expression vector 
pCHOl . For stable production of N-terminal FLAG-tagged 
NR6 the vector contains the DHFR (dihydrof olate 
reductase) gene as a selective marker with the NR6 gene 

15 under the control of an EFla promoter. CHO cells were 
transfected with the construct using a polycationic 
liposome transfection reagent (Lipof ectamine , GibcoBRL) . 

(ii) Lipof ectamine transfection method 

20 

Using six well tissue culture plates either 2 x 10^ KUSA 
cells in 2ml IMDM + 10% (v/v) FCS or 2 x lO' CHO cells 
were cultured in 2ml "-MEM + 10% (v/v) FCS until 70% 
confluent. 2Fg DNA diluted in lOOFl OPTI-MEM I (Gibco 

25 BRL, USA) was mixed gently with 12F1 lipof ectamine 
diluted in lOOFl OPTI-MEM I and incubated at room 
temperature for 3 0min to allow DNA complex formation, 
DNA complexes were gently diluted in a total volume of 
1ml of OPTI-MEM I and overlaid onto washed KUSA or CHO 

30 cell monolayers. A further 1ml IMDM + 20% (v/v) FCS 
(KUSA cells) or 1ml "-MEM + 20% (v/v) FCS (CHO cells) 
was added to transfected cells after 5 hours. At 24 
hours, the culture medium was replaced with fresh 
complete growth medium. At 48 hours after transfection, 

35 selection was applied- A methotrexate resistant clone 
secreting comparatively high levels of NR6 was selected 
and expanded for further analysis. 
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(iii) Protein expression 

CHO cells were grown to confluence in roller bottles in 
nucleoside free "-MEM + 10% (v/v) PCS. Selection was 
5 maintained by using 100 ng/ml Methotrexate in the 
conditioned media according to manufacturer 
instructions. Expression was monitored by Biosensor and 
harvesting found to be optimal at 3 to 4 days. 

10 B. Protein Analysis 

(i) Biosensor analysis 

Expression and purification was monitored by Biosensor 
15 analysis (BiaCoreTM, Sweden) where anti FLAG peptide M2 
antibody (Kodak Eastman, USA) , specific for the FLAG 
peptide sequence was bound to the sensorchip . Fractions 
were analysed for binding to the sensor surface 
(resonance units) and the sample then removed from the 
20 surface using 50 mM Diethylamine pH 12.0 prior to 
analysis of the next fraction. Immobilisation and 
running conditions of the Biosensor follow the 
manufacturer's instructions. 

25 (ii) Protein Production 

In order to generate and characterise NR6 , conditioned 
media (2 L) produced by CHO cells was harvested after 
day 3, post confluence. Conditioned media was 
30 concentrated using diaf iltration with a 10,000 molecular 
weight cut-off. (Easy flow, Sartorius, Aus) . At a volume 
of 200 ml (i.e. 10 x concentrated) the sample was buffer 
exchanged into 20 mM Tris, 0 . 15M NaCl , 0.02% (v/v) Tween 
20 pH 7.5 (Buffer A) . 

35 

(iii) immunoprecipitation and Western Blot analysis 

of mNR6 
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Concentraced conditioned media (1ml) was 
immunoprecipitated with M2 affinity resin (20F1, Kodak 
Eastman) . To examine the structural characterisation of 
mNR6 SDS PAGE was performed under reducing and non- 
5 reducing conditions. Separation was performed on NOVEX 
4-20% (v/v) Tris/glycine gradient gels and protein 
transfered on PVDF membrane. Western blots were probed 
with biotinylated M2 antibody (primary, 1:500) and then 
streptavidin peroxidase (secondary, 1:3000). Samples 
10 were visualised by autoradiography using 

electrochemiluminescence (ECL, Dupont , USA) . 

By regressional analysis of prestained standards 
(BIORAD, Aus.) the molecular weight of the monomeric 

15 unit was calculated to be 65,000 daltons. Under non- 
reducing conditions the molecular weight was calculated 
to be 127,000 indicating that NR6 is a disulphide linked 
dimer. A tetrameric complex running at approximately 
250,000 daltons was also observed. Although a band 

20 running at approximately 50,000 daltons was observed, no 
monomeric NR6 was detected under non- reducing conditions 
indicating that the majority of NR6 expressed in this 
system is disulphide linked. 

2 5 (iv) Affinity Chromatography of mNR6 

Concentrated conditioned media (200 ml) was applied to 
M2 affinity resin (5ml) under gravity. To enhance 
recovery the unbound fraction was reapplied to the 

30 column four times prior to extensive washing of the 

column with 200 volumes of Buffer A. Biosensor analysis 
indicates that approximately 20% of the M2 binding 
originally present in the concentrate remains in the 
unbound fraction. The bound fraction was eluted from the 

35 column using an immunodesorbant (50 ml ) ; actisep 
(Sterogene Labs, USA) . 
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(v) Ion exchange and Desalting of mNR6 

In order co buffer exchange inNR6 prior to anion 
chromatography, 10 ml batches of the eluted fraction (50 
5 ml) were applied to an XK column (400 x 26 mm I.D.) 
containing G25 sepharose (Pharmacia, Sweden) . 
Chromatography was developed at 4 ml/min using an FPLC 
(Pharmacia, Sweden) equipped with an online UV280 and 
conductivity monitor. The mobile phase was 10 mM Tris, 
10 O.IM NaCl, 0.02% v/v Tween, pH 8 . 0 . 10 ml fractions were 
collected between 12.5 min and 25 min to optimise 
recovery and removal of salt. Fractions were analysed by 
Biosensor analysis and pooled according to binding. 

15 All pooled active fractions were diluted with an equal 
volume of 20 mM Tris, 0.02% (v/v) Tween, pH 8 . 5 (Buffer 
B) and then loaded onto a Mono Q 5/5 (Pharmacia, Sweden) 
at a flow rate of 2 ml/min. The column was washed with 
buffer B. Elution was performed using a linear gradient 

20 between buffer B and buffer B containing 0 . 6M NaCl over 
3 0 min at a flow rate of 1 ml/min. Fractions (1 minute) 
were collected and analysed on the Biosensor and also by 
SDS PAGE and Western blot analysis. Fractions 15 to 26 
(approximately 0.4M NaCl) appear to contain the majority 

25 of mNR6 as indicated by the Biosensor. 

C. Production of CHO derived N' FLAG-mNRS (monomeric 
form) 

30 (i) Protein Production 

A cDNA fragment containing the entire coding sequence of 
murine NR6 with an N- terminal FLAGJ sequence was cloned 
into the expression vector pCHOl for production of N- 
3 5 terminal FLAG- tagged protein. This vector contains a 

neomycin resistance gene with expression of the NR6 gene 
under the control of an EFl" promoter. This expression 
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construct was transfected into CHO cells using 
Lipofect amine (Gibco BRL, USA) according to the 
manufacturer instructions. Transfected cells were 
cultured in IMDM + 10% (v/v) FCS with resistant cells 
5 selected in geneticin (600Fg/ml, Gibco BRL, USA). A 

neomycin resistant clone, secreting comparatively high 
levels of NRG was selected and expanded for further 
analysis. 

10 (ii) Protein expression 

N' FLAG-NR6 expressed in serum free conditioned media 
(10 litre) was harvested from transfected CHO and cells. 
Collected media was concentrated using a CH2 

15 ultrafiltration system equipped with a SIYIO cartridge 
(Amicion molecular weight cut-off 10,000). Preliminary 
examination of the expressed product under reducing and 
non- reducing SDS PAGE followed by western blot analysis 
was performed. Visualisation of the protein on Westerns 

20 was specific to the primary antibody anti FLAG M2 . Under 
reducing conditions a band approximately at 65,000 
daltons was observed. Under non- reducing conditions, 
dimer and larger molecular weight aggregates were 
observed. These are disulphide linked monomers as they 

25 are not present in the reducing gel. Small amounts of 
monomer appear to be present in non-reducing gels. 

(iii) Affinity Chromatography of NR6 

Concentrated conditioned media was applied to an anti 
30 FLAG M2 affinity resin (100 x 16 mm I.D.) . After washing 
the unbound proteins off the column, the bound proteins 
were eluted using FLAG peptide (60Fg/ml) in PBS. 

(iv) Ion Exchange Chromatography of NR6 

35 

Eluted fractions from affinity column were dialysed 
overnight against 20 mM Tris-HCl pH 8.5 (buffer C) 
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containing 50 mM Dithiochretol (DTT) using 25,000 cut- 
off dialysis tubing (Spectra/Por7 , Spectrum) . The 
dialysed fractions were loaded onto Mono Q 5/5 
(Pharmacia, Sweden) previously equilibrated with buffer 
5 C containing 5 mM DTT. Chromatography was developed 

using a linear gradient between buffer C and buffer C 
containing 1.0 M NaCl at a flow rate of 0 . 5 ml / min. 

(v) Refolding of NR6 

10 

Fractions containing NR6 from the Mono Q were adjusted 
to 50 mM DTT and left overnight at 41C. To initiated 
refolding the sample was then dialysed against 50 mM 
Tris-HCl (pH 8.5), 2 M Urea, 0.1% (v/v) Tween 20, 10 mM 
15 Glutathione (reduced) and 2 mM Glutathione (oxidised) at 
a final protein concentration of 100 Fg / ml . Folding 
was carried out at ambient temperature with one change 
of the buffer over 24 hours. 

20 (v) Reversed Phase High Performance Liquid 

Chromatography ( RP - HPLC ) 

The folded product was further purified by RP-HPLC using 
a Vydac C4 resin (2 50 x 4.6 mm I.D.) previously 
25 equilibrated with 0.1% (v/v) Trif luoroacetic acid (TFA) . 
Elution was carried out using a linear gradient from 0 
to 80% (v/v) acetonitrile / 0.1% (v/v) TFA at a flow 
rate of 1 ml per minute. 

3 0 D. PCH01/NR6/FIAG 

In order to determine the native N termini of NRG, a C 
terminal FLAG NR6 CHO cell line was established. 

The plasmid pKUSA166 (murine NR6 cDNA cloned into the 
35 EcoR I site of pBLUESCRIPT) was digested with BamH I to 
remove the sequences encoding the last 15 amino acids of 
murine NR6 . Synthetic oligonucleotides which encode the 
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3' end of mouse NR6 followed by che FLAG peptide tag 
were annealed and ligated into the BamH I site of 
pKUSAi66. The sequence of the oligonucleotides was as 
follows : - 



I LPSGRRGAARGPAGDYKD 
D D D K * tSEQ ID NO: 34] 

GATCTTGCCCTCGGGCAGACGGGGTGCGGCGAGAGGTCCTGCCGGCGACTACAAGG 

10 ACGACGATGACAAGTA G [SEQ ID NO:33] 

AACGGGAGCCCGTCTGCCCCACGCCGCTCTCCAGGACGGCCGCTGATGTTCCTGCT 

GCTACTGTTCATCCTAG [SEQ ID NO: 35] 

The 5' end of the linker introduces a silent mutation 
15 (CTG > TTG) , to destroy the 5' BamH I site upon 

insertion of the linker. The NR6 cDNA (with native 
signal sequence) with the C- terminal FLAG was cut out of 
pKUSA166 with EcoR I and BamH I and cloned into the EcoR 
I - BamH I cloning sites of pCHO-1. This vector results 
20 in the secretion of NR6 protein with a C-terminal flag 
tag (CN FLAG-mRN6) . 

This vector results in the secretion of NR6 protein from 
KUSA cells. The vector pCHOl has been previously 
25 described in (17) although with a different secretable 
marker . 

(i) Production of polyclonal NR6 antiserum 

30 The following peptide from the N terminal area of NR6 

was chosen for production of polyclonal antiserum to NR6 

VISPQDPTLLIGSSLQATCSIHGDTP [SEQ ID NO: 39] 

3 5 The peptide was conjugated to KLH and injected into 

rabbits. Production and purification of the polyclonal 
antibody specific to the NR6 peptide sequence follows 
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standard methods. 

(ii) Protein expression 

5 KUSA cells transfected with cDNA of C terminal tagged 
mNR6 were grown to confluence in flasks (800ml) using 
IMDM media containing 10% (v/v) FBS . Conditioned media 
(100 ml) was harvested 3 -4 days post confluence. 

10 (iii) Characterisation of NR6 by Immunoprecipitation 

and Western blotting 

In order to establish that NRG with the predicted 
sequence is produced in KUSA cells transfected with the 

15 cDNA, western blot analysis using both M2 antibody and 
purified NR6 specific rabbit antibody were performed. 
Conditioned media (1 to 5 ml) was immunoprecipitated 
with M2 affinity resin (10-20 Fl) . Then after sufficient 
time for binding, the beads were washed with MT-PBS and 

20 subsequently NRG eluted with 100 Fg/ml FLAG peptide (40 
Fl, (1, 5 minute incubation). The sample was then 
subjected to reducing and non reducing SDS PAGE followed 
by western blot analysis. Both purified NRG polyclonal 
antibody (purified by protein G) and M2 antibody 

2 5 recognise a band under reducing conditions of a 

molecular weight size approximately 65,000 daltons. 
Since the two antibodies reconising resides at the N 
terminus and C terminus it is reasonable to assume that 
full length NRG is produced. Biotinylation of the 

3 0 respective antibodies by standard methods reduces the 

background. Under non-reducing conditions polyclonal NRG 
bind antibodies to a band of a molecular weight of 
approximately 127,000, consistent with a dimeric NRG 
disulphide linked form. Minor components of tetrameric 
35 NRG are present/ no monomeric NRG is evident using 
polyclonal NRG antibodies, 
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EXAMPLE 15 
Generation of NR6 knockout mice 

To construct the NR6 targeting vector, 4 . Ikb of genomic 
5 NR6 DNA containing exons 2 through to 6 was deleted and 
replaced with G418 -resistance cassette, leaving 5N and 
3N NR6 arms of 2 . 9 and 4 . 5 kb respectively. A 4 . 5 kb 
Xhol fragment of the murine genomic NR6 clone 2.2 
(Figure 3) containing exons 7, 8 and 3N flanking 

10 sequence was subcloned into the Xhol site of pBluescript 
generating pBSNR6Xho4 . 5 . A 2 . 9kb Notl-Stul fragment 
within NR6 intron 1 from the same genomic clone was 
inserted into Not I and EcoRV digested pBSNR6Xho4 . 5 
creating pNR6-Ex2-6. This plasmid was digested with 

15 Clal, which was situated between the two NR6 fragments, 
and following blunt ending, ligated with a blunted 6kb 
Hindi II fragment from placZneo, which contains the 
lacZgene and a PGKneo cassette, to generate the final 
targeting vector, pNR61acZneo. pNR61acZneo was 

20 linearised with NotI and electroporated into W9 . 5 

embryonic stem cells. After 48 hours, transfected cells 
were selected in 175 Fg/ml G418 and resistant clones 
picked and expanded after a further 8 days. 

25 Clones in which the targetcing vector had recombined 
with the endogenous NR6 gene were identified by 
hybridising Spel -digested genomic DNA with a 0 . 6 kb 
XhoI-StuI fragment from genomic NR6 clone 2.2. This 
probe (probe A, Figure 4) , which is located 3N to the 

3 0 NR6 sequences in the targeting vector, distinguished 
between the endogenous (9.9 kb) and targeted (7.1 kb) 
NR6 loci (Figure 5) . 

Genomic DNA was digested with Spel for 16hrs at 371C, 
35 electrophoresed through 0.8% (w/v) agarose, transferred 
to nylon membranes and hybridised to ^^P- label led probe 
in a solution containing 0 . 5M sodium phosphate, 7% (w/v) 
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SDS, ImM EDTA and washed in a solution containing 40mM 
sodium posphate. 1% (w/v) SDS at 651C. Hybridising 
bands were visualised by autoradiography for 16 hours at 
-701C using Kodak XAR-5 film and intensifying screens. 
5 Two targeted ES cell clones, W9.5NR6-2-44 and W9.5NR6-4- 
2, were injected into C57B1/6 blastocysts to generate 
chimeric mice. Male chimeras were mated with C57B1/6 
females to yield NR6 heterozygotes which were 
subsequently interbred to produce wild-type (NR6*^') , 
10 heterozygous (NR6"^") and mutant (NR6''') mice. The 

genotypes of offspring were determined by Southern Blot 
analysis of genomic DNA extracted from tail biopsies. 

Genotyping of mice at weaning from matings between NR*^' 
15 heterozygous mice derived from both targated ES cell 

clones revealed an absence of homozygous NR6"'" mutants. 

As no unusual loss of mice was observed between birth 

and weaning, this suggest that lack of NR6 is lethal 

during embryonic development or immediately after birth. 
20 Genotyping of embryonic tissues at various stages of 

development suggests that death occurs late in gestation 

(beyond day 16) or at birth. 



EXAMPLE 16 

2 5 Oligonucleotides 
1943 : 

5 " GTC CAA GTG CGT TGT AAC CCA 3 ' 
2070 : 

5 • GOT GAG TGT GCG CTG GGT CTC ACC 3 ' 
30 2057: 

5 ♦ GGC TCC ACT CGC TCC AGA 3 * 



Those skilled in the art will appreciate that the 
invention described herein is susceptible to variations 
3 5 and modifications other than those specifically 

described. It is to be understood that the invention 
includes all such variations and modifications. The 
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invention also includes all of the steps, features, 
compositions and compounds referred to or indicated in 
this specification, individually or collectively, and 
any and all combinations of any two or more of said 
5 steps or features. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: {Other than US) AMRAD OPERATIONS PTY 

LTD 

(US only) Douglas James HILTON, Nicos Antony 
NICOLA, Alison FARLEY, Tracey WILLSON, Jian-Guo ZHANG, 
10 Warren ALEXANDER, Steven RAKAR, Louis FABRI , Tetsuo 

KOJIMA, Masatsugu MAEDA, Yasumfumi KIKUCHI , Andrew NASH 

(ii) TITLE OF INVENTION: A NOVEL HAEMPOIETIN 

RECEPTOR AND GENETIC 
15 SEQUENCES ENCODING SAME 

(iii) NUMBER OF SEQUENCES: 39 

(iv) CORRESPONDENCE ADDRESS: 

20 (A) ADDRESSEE: DAVIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 
25 (F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

30 (C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version 

#1.25 

(vi) CURRENT APPLICATION DATA: 
3 5 (A) APPLICATION NUMBER: 

PCT INTERNATIONAL APPLICATION 
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(B) FILING DATE: ll'SEP-1997 



(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P02246/96 

(B) FILING DATE: ll-SEP-1996 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: HUGHES DR, E JOHN L 

(C) REFERENCE/ DOCKET NUMBER: EJH/AF 

10 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +61 3 92 54 2 777 

(B) TELEFAX: +61 3 9254 2770 



15 (2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 

25 

Trp Ser Xaa Trp Ser 
3 0 (2) INFORMATION FOR SEQ ID NO : 2 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

5 

ACTCGCTCCA GATTCCCGCC TTTT 2 4 

10 (2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
25 TCCCGCCTTT TTCGACCCAT AGAT 2 4 



(2) INFORMATION FOR SEQ ID NO : 4 : 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 

(ii) MOLECULE TYPE: DNA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GGTACTTGGC TTGGAAGAGG AAAT 24 
(2) INFORMATION FOR SEQ ID NO : 5 : 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

CGGCTCACGT GCACGTCGGG TGGG 24 
(2) INFORMATION FOR SEQ ID NO : 6 : 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
2 5 ( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGCTGCTGTT AAAGGGCTTC TC 22 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



PCT/GB97/02479 



10 



15 



35 



(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

(A/G) CTCCA (A/G) TC (A/G) CTCCA 15 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

3 0 (A/G) CTCCA (C/T)TC (A/G) CTCCA 15 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 

- 75 - 



SUBSTITUTE SHEET (RULE 26) 



W0 9JJ/11225 PCT/GB97/02479 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

10 

AAGTGTGACC ATCATGTGGA C 21 



15 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



25 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 



3 0 GGAGGTGTTA AGGAGGCG 



(2) INFORMATION FOR SEQ ID NO: 11: 

3 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

- le - 



18 
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10 



25 



30 



(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 



ATGCCCGCGG GTCGCCCG 18 



15 (2) INFORMATION FOR SEQ ID NO : 12 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1506 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 1 . . 1242 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 



GGCACGAGCT TCGCTGTCCG CGCCCAGTGA CGCGCGTGCG GACCCGAGCC CCAATCTGCA -6- 

3 5 CCCCGCAGAC TCGCCCCCGC CCCATACCGG CGTTGCAGTC ACCGCCCGTT GCGCGCCACC -4 

CCC ■ ^ 

ATG CCC GCG GGT CGC CCG GGC CCC GTC GCC CAA TCC GCG CGG CGG CCG 4 8 
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Met Pro Ala Gly Arg Pro Gly Pro Vai Ala Gin Ser Ala Arg Arg Pro 

X 5 10 15 

CCG CGG CCG CTG TCC TCG CTG TGG TCG CCT CTG TTG CTC TGT GTC CTC 96 

5 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 

20 25 30 

GGG GTG CCT CGG GGC GGA TCG GGA GCC CAC ACA GCT GTA ATC AGC CCC 144 

Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 

10 35 40 45 



CAG GAC CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT 192 

Gin Asp Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser 
50 55 60 

ATA CAT GGA GAC ACA CCT GGG GCC ACC GCT GAG GGG CTC TAC TGG ACC 24 0 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 

65 70 75 80 



2 0 CTC AAT GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC 2 88 

Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 



TCC ACC CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG 33 6 
2 5 Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 

100 105 110 



TCA GGA GAC AAT CTG GTG TGT CAC GCC CGA GAC GGC AGC ATT CTG GCT 3 84 
Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala 
30 lis 120 125 



35 



GGC TCC TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC 4 32 
Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn He 
130 135 140 
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AGC TGC TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG 4 80 
Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 

5 GGT GCA CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC 52 8 

Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 
165 170 175 

AAG CTG AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT 57 6 
10 Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 

180 185 190 

GTG GGC OCT CAC TCA TGC CAT ATC CCC AAG GAC CTG GCC CTC TTC ACT 624 
Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe Thr 
15 195 200 205 

CCC TAT GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA GCA AGA 672 

Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 

210 215 220 

20 

TCT GAT GTC CTC ACA CTG GAT GTC CTG GAC GTG GTG ACC ACG GAC CCC 72 0 

Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 

225 230 235 240 

2 5 CCA CCC GAC GTG CAC GTG AGC CGC GTT GGG GGC CTG GAG GAC CAG CTG 768 

Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 
245 250 255 

AGT GTG CGC TGG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA 816 

3 0 Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 

260 265 270 

GCC AAG TAC CAG ATC CGC TAC CGC GTG GAG GAC AGC GTG GAC TGG AAG 8 64 
Ala Lys Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
35 275 280 285 
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GTG GTG GAT GAC GTC 
Val Val Asp Asp Val 
290 

5 AAG CCC GGC ACC GTT 

Lys Pro Gly Thr Val 
305 

ATC TAT GGG TCG AAA 
10 He Tyr Gly Ser Lys 

325 



AGC AAC CAG ACC TCC TGC 
Ser Asn Gin Thr Ser Cys 
295 

TAC TTC GTC CAA GTG CGT 
Tyr Phe Val Gin Val Arg 
310 315 

AAG GCG GGA ATC TGG AGC 
Lys Ala Gly He Trp Ser 
330 



CGT CTC GCG GGC CTG 912 

Arg Leu Ala Gly Leu 

300 

TGT AAC CCA TTC GGG 96 0 
Cys Asn Pro Phe Gly 
320 

GAG TGG AGC CAC CCC 1008 
Glu Trp Ser His Pro 
335 



ACC GCT GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG 1056 

Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 
15 340 345 350 

GTG TGC GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC 1104 

Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 
355 360 365 

20 

GAG CTC AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAC TGC TCG 1152 

Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 
370 375 380 

2 5 AAC CTT AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG 12 OO 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
385 390 395 400 



TCA CAC AAG ACC CGA AAC CAG GTC CTG CCG GCT AAA CTC TAAGGATAGG 124 9 

3 0 Ser His Lys Thr Arg Asn Gin Val Leu Pro Ala Lys Leu 

405 410 

CCATCCTCCT GCTGGGTCAG ACCTGGAGGC TCACCTGAAT TGGAGCCCCT CTGTACCATC 13 0 9 

3 5 TGGGCAACAA AGAAACCTAC CAGAGGCTGG GGCACAATGA GCTCCCACAA CCACAGCTTT 13 6 9 

GQTCCACATG ATGGTCACAC TTGGATATAC CCCAGTGTGG GTAAGGTTGG GGTATTGCAG 14 2 9 
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5 



10 



1506 



GGCCTCCCAA CAATCTCTTT AAATAAATAA AGGAGTTGTT CAGGTAAAAA AAAAAAAAAA 14 8 9 
AAAAAAAAAA AAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 413 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
1 5 10 15 

20 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 

20 25 30 

Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val He Ser Pro 
35 40 45 

25 

Gin Asp Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 
50 55 60 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
30 65 70 V5 80 

Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 

35 Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 

100 105 110 
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Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala 
115 120 125 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie 
5 130 135 140 

Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 

10 Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 

165 170 175 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

15 

Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 

Pro Tyr Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
20 210 215 220 

Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 
225 230 235 240 

2 5 Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 

245 250 255 

Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

30 

Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
275 280 285 

Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 
35 290 295 300 
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Lys Pro Gly Thr 
305 

lie Tyr Gly Ser 

5 

Thr Ala Ala Ser 
340 

10 Val Cys Glu Pro 

355 

Glu Leu Lys Gin 
370 

15 

Asn Leu Ser Phe 
385 

Ser His Lys Thr 

20 



Val Tyr Phe Val Gin Val 
310 

Lys Lys Ala Gly He Trp 

325 330 

Thr Pro Arg Ser Glu Arg 
345 

Arg Gly Gly Glu Pro Ser 
360 

Phe Leu Gly Trp Leu Lys 
375 

Arg Leu Tyr Asp Gin Trp 
390 

Arg Asn Gin Val Leu Pro 
405 410 



Arg Cys Asn Pro Phe Gly 
315 320 

Ser Glu Trp Ser His Pro 
335 

Pro Gly Pro Gly Gly Gly 
350 

Ser Gly Pro Val Arg Arg 
365 

Lys His Ala Tyr Cys Ser 
380 

Arg Ala Trp Met Gin Lys 
395 400 

Ala Lys Leu 



(2) INFORMATION FOR SEQ ID NO: 14: 

25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



35 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1278 

- 83 - 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 981 1 225A2_L> 



wo 98/11225 



PCT/GB97/02479 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GGCACGAGCT TCGCTGTCCG CGCCCAGTGA CGCGCGTGCG GACCCGAGCC CCAATCTGCA -65 
CCCCGCAGAC TCGCCCCCGC CCCATACCGG CGTTGCAGTC ACCGCCCGTT GCGCGCCACC -5 
CCCA -1 



ATG CCC GCG GGT CGC CCG GGC CCC GTC GCC CAA TCC GCG CGG CGG CCG 4 8 

Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
15 10 15 

15 CCG CGG CCG CTG TCC TCG CTG TGG TCG CCT CTG TTG CTC TGT GTC CTC 96 

Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 
20 25 30 

GGG GTG CCT CGG GGC GGA TCG. GGA GCC CAC AC A OCT GTA ATC AGC CCC 14 4 

20 Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 

35 40 45 

CAG GAC CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT 192 
Gin Asp Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser 
25 50 55 60 

ATA CAT GGA GAC ACA CCT GGG GCC ACC GCT GAG GGG CTC TAC TGG ACC 24 0 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
65 70 75 80 

30 

CTC AAT GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC 288 
Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 

35 TCC ACC CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG 336 

Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 
100 105 110 
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TCA GGA GAC AAT CTG GTG TOT CAC GCC CGA GAC GGC AGC ATT CTG GCT 3 84 

Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala 
115 120 125 



5 GGC TCC TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC 432 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie 
130 135 140 

AGC TGC TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG 4 80 

10 Ser cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 

145 150 155 160 



GGT GCA CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC 52 8 

Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 
15 165 170 175 

AAG CTG AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT 576- 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

20 

GTG GGC CCT CAC TCA TGC CAT ATC CCC AAG GAC CTG GCC CTC TTC ACT 62 4 

Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 



2 5 CCC TAT GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA GCA AGA 672 

Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
210 215 220 



TCT GAT GTC CTC ACA CTG GAT GTC 

3 0 Ser Asp Val Leu Thr Leu Asp Val 

225 230 

CCA CCC GAC GTG CAC GTG AGC CGC 

Pro Pro Asp Val His Val Ser Arg 
35 245 



CTG GAC GTG GTG ACC ACG GAC CCC 720 
Leu Asp Val Val Thr Thr Asp Pro 
235 240 

GTT GGG GGC CTG GAG GAC CAG CTG 7 68 

Val Gly Gly Leu Glu Asp Gin Leu 
250 255 
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AGT GTG CGC TOG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA 816 
Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

5 GCC AAG TAC CAG ATC CGC TAC CGC GTG GAG GAC AGC GTG GAC TGG AAG 864 

Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
275 280 285 

GTG GTG GAT GAC GTC AGC AAC CAG ACC TCC TGC CGT CTC GCG GGC CTG 912 
10 Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 

290 295 300 

AAG CCC GGC ACC GTT TAC TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG 960 
Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly 
15 305 310 315 320 

ATC TAT GGG TCG AAA AAG GCG GGA ATC TGG AGC GAG 'TGG*^^AGC CAC CCC 1008 

He Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His Pro 

325 330 335 

20 

ACC GCT GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG 1056 

Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 

340 345 350 

25 GTG TGC GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC 1104 

Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 
355 360 365 

GAG CTC AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAC TGC TCG 1152 
3 0 Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 

370 375 380 

AAC CTT AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG 12 00 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
35 385 390 395 400 
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TCA CAC AAG ACC CGA AAC CAG GAG GAG GGG ATC CTG CCT TCG GGC AGA 124 8 

Ser His Lys Thr Arg Asn Gin Asp Glu Gly lie Leu Pro Ser Gly Arg 
405 410 415 

5 GGG GGT GCG GCG AGA GGT CCT GCC GGT TAAACTCTAA GGATAGGCCA 12 95 

Arg Gly Ala Ala Arg Gly Pro Ala Gly 

420 425 



10 



TCCTCCTGCT GGGTCAGACC TGGAGGCTCA CCTGAATTGG AGCCCCTCTG TACCATCTGG 13 55 

GCAACAAAGA AACCTACCAG AGGCTGGGGC ACAATGAGCT CCCACAACCA CAGCTTTGGT 1415 

CCACATGATG GTCACACTTG GATATACCCC AGTGTGGGTA AGGTTGGGGT ATTGCAGGGC 14 75 

15 CTCCCAACAA TCTCTTTAAA TAAATAAAGG AGTTGTTCAG GTAAAAAAAA AAAAAAAAAA 153 5 

AAAAAAAAAA-^AAAA 



20 



(2) INFORMATION FOR SEQ ID NO: 15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 5 amino acids 

2 5 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

3 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
15 10 15 

35 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 

20 25 30 
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Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 
35 40 45 

Gin Asp Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 
5 50 55 60 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
65 70 75 80 

10 Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 

85 90 95 

Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 
100 105 110 

15 

Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala 
115 120 125 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn He 
20 130 135 140 

Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 

2 5 Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 

165 170 175 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

30 

Val Gly Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 

Pro Tyr Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
35 210 215 220 
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Ser Asp Val I^eu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 
225 230 235 240 

Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 
5 245 250 255 

Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

10 Ala Lys Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 

275 280 285 

Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 
290 295 300 

15 

Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly 
305 310 315 320 

lie Tyr Gly Ser Lys Lys Ala Gly lie Trp Ser Glu Trp Ser His Pro 
20 325 330 335 

Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 
340 345 350 

2 5 Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 

355 360 365 

Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 
370 375 380 

30 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
385 390 395 400 

Ser His Lys Thr Arg Asn Gin Asp Glu Gly lie Leu Pro Ser Gly Arg 
35 405 410 415 
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Arg Gly Ala Ala Arg Gly Pro Ala Gly 



PCT/GB97/02479 



420 



425 



(2) INFORMATION FOR SEO ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 8 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



15 



20 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . .468 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 



2 5 GGC ACC GTT TAG TTC GTC CAA GTG COT TGT AAC CCA TTC GGG ATC TAT 4 8 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 

1 5 10 15 

GGG TCG AAA AAG GCG GGA ATC TGG AGC GAG TGG AGC CAC CCC ACC GCT 96 

30 Gly Ser Lys Lys Ala Gly lie Trp Ser Glu Trp Ser His Pro Thr Ala 

20 25 30 

GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG GTG TGC 14 
Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly Val Cys 
35 35 40 45 
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GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC GAG CTC 

Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg Glu Leu 
50 55 60 

5 AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAG TGC TCG AAC CTT 

Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser Asn Leu 



65 



70 75 80 



AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG TCA CAC 
Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys Ser His 
85 90 95 

AAG ACC CGA AAC CAG GTA GGA AAG TTG GGG GAG GCT TGC GTG GGG GGT 
Lys Thr Arg Asn Gin Val Gly Lys Leu Gly Glu Ala Cys Val Gly Gly 
100 105 110 

AAA GGA GCA GAG GAA GAG AGA GAC CCG GGT GAG CAG CCT CCA CAA CAC 
Lys Gly Ala Glu Glu Glu Arg Asp Pro Gly Glu Gin Pro Pro Gin His 
115 120 125 

CGC ACT CTT CTT TCC AAG CAC AGG ACG AGG GGA TCC TGC CCT CGG GCA 
Arg Thr Leu Leu Ser Lys His Arg Thr Arg Gly Ser Cys Pro Arg Ala 



130 



135 140 



5 GAC GGG GTG CGG CGA GAG GTA AGG GGG TCT GGG TGAGTGGGGC CTACAGCAGT 

Asp Gly Val Arg Arg Glu Val Arg Gly Ser Gly 
145 150 155 

CTAGATGAGG CCCTTTCCCC TCCTTCGGTG TTGCTCAAAG GGATCTCTTA GTGCTCATTT 

0 

CACCCACTGC AAAGAGCCCC AGGTTTTACT GCATCATCAA GTTGCTGAAG GGTCCAGGCT 
TAATGTGGCC TCTTTTCTGC CCTCAGGTCC TGCCGGCTAA ACTCTAAGGA TAGGCCATCC 
5 TCCTGCTGGG TCAGACCTGG AGGCTCACCT GAATTGGAGC CCCTCTGTAC CTATCTGGGC 

AACAAAGAAA CCTACCATGA GGCTGGGGCA CAATGAGCTC CCACAACCAC AGCTTTGGTC 
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CACATGATGG TCACACTTGG ATATACCCCA GTGTGGGTAA GGTTGGGGTA TTGCAGGGCC 84 5 

TCCCAACAAT CTCTTTAAAT AAATAAAGGA GTTGTTCAGG TAAAAAAAAA AAAAAAAAAA 905 

5 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAA 938 

(2) INFORMATION FOR SEQ ID NO: 17: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 
20 1 5 10 15 

Gly Ser Lys Lys Ala Gly lie Trp Ser Glu Trp Ser His Pro Thr Ala 
20 25 30 

2 5 Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly Val Cys 

35 40 45 

Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg Glu Leu 
50 55 60 

30 

Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser Asn Leu 
65 70 75 80 

Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys Ser His 
35 85 90 95 
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Lys Thr Arg Asn Gin Val Gly 
100 

Lys Gly Ala Giu Glu Glu Arg 
5 115 

Arg Thr Leu Leu Ser Lys His 
130 135 

10 Asp Gly Val Arg Arg Glu Val 

145 X50 



PCT/GB97/02479 

Lys Leu Gly Glu Ala Cys Val Gly Gly 
105 110 

Asp Pro Gly Glu Gin Pro Pro Gin His 
120 125 

Arg Thr Arg Gly Ser Cys Pro Arg Ala 
140 

Arg Gly Ser Gly 
155 



15 (2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 834 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



25 



30 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 834 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 



CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT ATA CAT 
Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser He His 
35 51 55 60 65 
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GGA GAC ACA CCT GGG GCC ACC GCT GAG GGG CTC TAC TGG ACC CTC AAT 14 6 

Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Leu Asn 
70 75 60 

5 GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC TCC ACC 194 

Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser Thr 
85 90 95 

CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG TCA GGA 24 2 

10 Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser Gly 

100 105 110 

GAC AAT CTG GTG TGT CAC GCC CGA GAC GGC AGC ATT CTG GCT GGC TCC 2 90 

Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala Gly Ser 
15 115 120 125 130 

TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC AGC TGC 3 38 

Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn He Ser Cys 



20 



13; 



140 145 



TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG GGT GCA 386 
Trp ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly Ala 
150 155 200 

2 5 CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC AAG CTG 4 34 

His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys Leu 
205 210 215 

AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT GTG GGG 4 82 

3 0 Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr Val Gly 

220 225 230 

CCC CAC TCA TGC CAT ATC CCC AAG GAC CTG GCC CTC TTC ACT CCC TAT 53 0 

Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr Pro Tyr 
35 235 240 245 250 
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GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA OCA AGA TCT GAT 5 78 

Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg Sex Asp 
255 260 265 

5 GTC CTC ACA CTG GAT GTC CTG GAG GTG GTG ACC ACG GAC CCC CCA CCC 62 6 

Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro Pro Pro 
270 275 280 

GAC GTG CAC GTG AGC CGC GTT GGG GGC CTG GAG GAC CAG CTG AGT GTG 67 4 

10 Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu Ser Val 

285 290 295 

CGC TGG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA GCC AAG 72 2 

Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin Ala Lys 
15 300 305 310 

TAC CAG ATC CGC TAC CGC GTG GAG GAC AGC GTG GAC TGG AAG GTG GTG 77 0 

Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys Val Val 

315 320 325 330 

20 

GAT GAC GTC AGC AAC CAG ACC TCC TGC CGT CTC GCG GGC CTG AAG CCC 818 

Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu Lys Pro 

335 340 345 

2 5 GGC ACC GTT TAC TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG ATC TAT 86 6 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 
350 355 360 

GGG TCG AAA AAG GCG GGA 8 94 

3 0 Gly Ser Lys Lys Ala Gly 

365 



(2) INFORMATION FOR SEQ ID NO: 19: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 78 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

5 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser He His 
10 51 55 60 65 

Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Leu Asn 
70 75 80 

15 Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser Thr 

85 90 95 

Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser Gly 
100 105 110 

20 

Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala Gly Ser 
115 120 125 130 

Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn He Ser Cys 
25 135 140 145 

Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly Ala 
150 155 200 

30 His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys Leu 

205 210 215 

Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr Val Gly 
220 225 230 

35 

Pro His ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr Pro Tyr 
235 240 245 250 
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Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg Ser Asp 
255 260 265 



Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro Pro Pro 
5 270 275 280 

Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu Ser Val 
265 290 295 

10 Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin Ala Lys 

300 305 310 

Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys Val Val 
315 320 325 330 



15 



20 



25 



Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu Lys Pro 
335 340 345 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 
350 355 360 

Gly Ser Lys Lys Ala Gly 
365 



(2) INFORMATION FOR SEQ ID NO: 20: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 3 base pairs 
30 (B) TYPE: nucleic acids 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

3 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
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GGCATGAAGG CTTAGGGTGG GGATCGGTAG GACCCATGCA CCCAGAGAAA GGGACTGGTG 6 0 

GCAACTTTCA AACTCTCTGG GGAAGGAAGA AGGGCTGAAA GAGG 104 

5 ATG AAC GGG CTC AGA CAC AGC TGT AAT CAG CCC CCA GGA 14 3 

Met Asn Gly Leu Arg His Ser Cys Asn Gin Pro Pro Gly 

5 10 

10 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acids 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

20 

Met Asn Gly L.eu Arg His Ser Cys Asn Gin Pro Pro Gly 

5 10 

25 

<2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 1930 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA . 
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(xi) 

GGCACGAGCT 
5 CCCCGCAGAC 
CCCAATGCCC 
GCTGTCCTCG 

10 

GGGAGCCCAC 
GCAAGCTACC 
1 5 GACCCTCAAT 
GGCCCTGGCC 
TCACGCCCGA 

20 

GAAGCCCTTT 
ACCGGGTGCA 
2 5 GTGGTACGGT 
TATCCCCAAG 
CCTAGGCTCA 

30 

CCCCCCACCC 
CTGGGTCTCA 
35 CCGCGTGGAG 
CCGTCTCGCG 

BNSDOCIO: <WO 981 1 225A2J_> 



PCT/GB97/02479 

SEQUENCE DESCRIPTION: SEQ ID NO:22: 

TCGCTGTCCG CGCCCAGTGA CGCGCGTGCG GACCCGAGCC CCAATCTGCA 6 0 

TCGCCCCCGC CCCATACCGG CGTTGCAGTC ACCGCCCGTT GCGCGCCACC 12 0 

GCGGGTCGCC CGGGCCCCGT CGCCCAATCC GCGCGGCGGC CGCCGCGGCC 180 

CTGTGGTCGC CTCTGTTGCT CTGTGTCCTC GGGGTGCCTC GGGGCGGATC 240 

ACAGCTGTAA TCAGCCCCCA GGACCCCACC CTTCTCATCG GCTCCTCCCT 3 00 

TGCTCTATAC ATGGAGACAC ACCTGGGGCC ACCGCTGAGG GGCTCTACTG 3 60 

GGTCGCCGCC TGCCCTCTGA GCTGTCCCGC CTCCTTAACA CCTCCACCCT 4 20 

CTGGCTAACC TTAATGGGTC CAGGCAGCAG TCAGGAGACA ATCTGGTGTG 4 80 

GACGGCAGCA TTCTGGCTGG CTCCTGCCTC TATGTTGGCT TGCCCCCTGA 54 0 

AACATCAGCT GCTGGTCCCG GAACATGAAG GATCTCACGT GCCGCTGGAC 600 

CACGGGGAGA CATTCTTACA TACCAACTAC TCCCTCAAGT ACAAGCTGAG 660 

CAGGATAACA CATGTGAGGA GTACCACACT GTGGGCCCTC ACTCATGCCA 720 

GACCTGGCCC TCTTCACTCC CTATGAGATC TGGGTGGAAG CCACCAATCG 780 

GCAAGATCTG ATGTCCTCAC ACTGGATGTC CTGGACGTGG TGACCACGGA 840 

GACGTGCACG TGAGCCGCGT TGGGGGCCTG GAGGACCAGC TGAGTGTGCG 900 

CCACCAGCTC TCAAGGATTT CCTCTTCCAA GCCAAGTACC AGATCCGCTA 960 

GACAGCGTGG ACTGGAAGGT GGTGGATGAC GTCAGCAACC AGACCTCCTG 1020 

GGCCTGAAGC CCGGCACCGT TTACTTCGTC CAAGTGCGTT GTAACCCATT 1080 
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CGGGATCTAT GGGTCGAAAA AGGCGGGAAT CTGGAGCGAG TGGAGCCACC CCACCGCTGC 114 0 

CTCCACCCCT CGAAGTGAGC GCCCGGGCCC GGGCGGCGGG GTGTGCGAGC CGCGGGGCGG 12 0 0 

5 CGAGCCCAGC TCGGGCCCGG TGCGGCGCGA GCTCAAGCAG TTCCTCGGCT GGCTCAAGAA 126 0 

GCACGCATAC TGCTCGAACC TTAGTTTCCG CCTGTACGAC CAGTGGCGTG CTTGGATGCA 132 0 

GAAGTCACAC AAGACCCGAA ACCAGGTAGG AAAGTTGGGG GAGGCTTGCG TGGGGGGTAA 13 80 

AGGAGCAGAG GAAGAGAGAG ACCCGGGTGA GCAGCCTCCA CAACACCGCA CTCTTCTTTC 14 4 0 

CAAGCACAGG ACGAGGGGAT CCTGCCCTCG GGCAGACGGG GTGCGGCGAG AGGTAAGGGG 15 00 

15 GTCTGGGTGA GTGGGGCCTA CAGCAGTCTA GATGAGGCCC TTTCCCCTCC TTCGGTGTTG 1560 

CTCAAAGGGA TCTCTTAGTG CTCATTTCAC CCACTGCAAA GAGCCCCAGG TTTTACTGCA 162 0 

TCATCAAGTT GCTGAAGGGT CCAGGCTTAA TGTGGGCTCT TTTCTGCCCT CAGGTCCTGC 1680 

20 

CGGCTAAACT CTAAGGATAG GCCATCCTCC TGCTGGGTCA GACCTGGAGG CTCACCTGAA 174 0 

TTGGAGCCCC TCTGTACCTA TCTGGGCAAC AAAGAAACCT ACCATGAGGC TGGGGCACAA 1800 

25 TGAGCTCCCA CAACCACAGC TTTGGTCCAC ATGATGGTCA CACTTGGATA TACCCCAGTG 1860 

TGGGTAAGGT TGGGGTATTG CAGGGCCTCC CAACAATCTC TTTAAATAAA TAAAGGAGTT 1920 



30 



GTTCAGGTAA 



(2) INFORMATION FOR SEQ ID NO:23: 

3 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 560 base pairs 

(B) TYPE: nucleic acid 



100 



1930 



BNSDOCID: <WO 981 1225A2J_> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 

(C) STRANDEDMESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

10 TCCAGGCAGC GGTCGGGGGA CAACCTCGTG TGCCACGCCC GTGACGGCAG CATCCTGGCT 60 

GGCTCCTGCC TCTATGTTGG CCTGCCCCCA GAGAAACCCG TCAACATCAG CTGCTGGTCC 12 0 

AAGAACATGA AGGACTTGAC CTGCCGCTGG ACGCCAGGGG CCCACGGGGA GACCTTCCTC 18 0 

15 

CACACCAACT ACTCCCTCAA GTACAAGCTT AGGTGGTATG GCCAGGACAA CACATGTGAG 24 0 

GAGTACCACA CAGTGGGGCC CCACTCCTGC CACATCCCCA AGGACCTGGC TCTCTTTACG 3 00 

2 0 CCCTATGAGA TCTGGGTGGA GGCCACCAAC CGCCTGGGCT CTGCCCGCTC CGATGTACTC 360 

ACGCTGGATA TCCTGGATGT GGTGACCACG GACCCCCCGC CCGACGTGCA CGTGAGCCGC 420 

GTCGGGGGCC TGGAGGACCA GCTGAGCGTG CGCTGGGTGT CGCCACCCGC CCTCAAGGAT 480 

TTCCTTTTTC AAGCCAAATA CCAGATCCGC TACCGAGTGG AGGACAGTGT GGAATGGAAG 54 0 
GTGGTGGACG ATGTGAGCAA 



25 



30 



(2) INFORMATION FOR SEQ ID NO: 24 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 91 base pairs 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DMA 



(ix) FEATURE: 
5 (A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1053 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 

10 

ACC CTC AAC GGG CGC CGC CTG CCC CCT GAG CTC TCC CGT GTA CTC AAC 4 8 

Thr Leu Asn Gly Arg Arg Leu Pro Pro Glu Leu Ser Arg Val Leu Asn 
15 10 15 

15 GCC TCC ACC TTG OCT CTG GCC CTG GCC AAC CTC AAT GGG TCC AGG CAG 96 

Ala Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin 
20 25 30 

CGG TCG GGG GAC AAC CTC GTG TGC CAC GCC CGT GAC GGC AGC ATC CTG 14 4 

2 0 Arg Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser tie Leu 

35 40 45 

GCT GGC TCC TGC CTC TAT GTT GGC CTG CCC CCA GAG AAA CCC GTC AAC 192 
Ala Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Val Asn 
25 50 55 60 

ATC AGC TGC TGG TCC AAG AAC ATG AAG GAC TTG ACC TGC CGC TGG ACG 24 0 

lie Ser Cys Trp Ser Lys Asn Met Lys Asp Leu Thr Cys Arg Trp Thr 
65 70 75 80 

30 

CCA GGG GCC CAC GGG GAG ACC TTC CTC CAC ACC AAC TAC TCC CTC AAG 2 88 

Pro Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys 
85 90 95 

3 5 TAC AAG CTT AGG TGG TAT GGC CAG GAC AAC ACA TGT GAG GAG TAC CAC 3 36 

Tyr Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His 
100 105 110 
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ACA GTG GGG CCC CAC TCC TGC CAC ATC CCC AAG GAC CTG GCT CTC TTT 3 84 

Thr Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe 
115 120 125 

5 ACG CCC TAT GAG ATC TGG GTG GAG GCC ACC AAC CGC CTG GGC TCT GCC 4 32 

Thr Pro Tyr Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala 
130 135 140 

CGC TCC GAT GTA CTC ACG CTG GAT ATC CTG GAT GTG GTG ACC ACG GAC 4 60 

10 Arg Ser Asp Val Leu Thr Leu Asp He Leu Asp Val Val Thr Thr Asp 

145 150 155 160 

CCC CCG CCC GAC GTG CAC GTG AGC CGC GTC GGG GGC CTG GAG GAC CAG 52 8 

Pro Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin 
15 165 170 175 

CTG AGC GTG CGC TGG GTG TCG CCA CCC GCC CTC AAG GAT TTC CTC TTT 57 6 

Leu Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe 
180 185 190 

20 

CAA GCC AAA TAC CAG ATC CGC TAC CGA GTG GAG GAC AGT GTG GAC TGG 624 

Gin Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp 
195 200 205 

2 5 AAG GTG GTG GAC GAT GTG AGC AAC CAG ACC TCC TGC CGC CTG GCC GGC 672 

Lys Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly 
210 215 220 

CTG AAA CCC GGC ACC GTG TAC TTC GTG CAA GTG CGC TGC AAC CCC TTT 72 0 

3 0 Leu Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe 

225 230 235 240 

GGC ATC TAT GGC TCC AAG AAA GCC GGG ATC TGG AGT GAG TGG AGC CAC 7 68 

Gly lie Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His 
35 245 250 255 
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CCC ACA GCC GCC TCC ACT CCC CGC AGT GAG CGC CCG GGC CCG GGC GGC 816 
Pro Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly 
260 265 270 

5 GGG GCG TGC GAA CCG CGG GGC GGA GAG CCG AGC TCG GGG CCG GTG CGG 8 64 

Gly Ala Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg 
275 280 285 

CGC GAG CTC AAG CAG TTC CTG GGC TGG CTC AAG AAG CAC GCG TAC TGC 912 
, 0 Arg Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys 

290 295 300 

TCC AAC CTC AGC TTC CGC CTC TAC GAC CAG TGG CGA GCC TGG ATG CAG 960 
Ser Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin 
5 305 310 315 320 

AAG TCG CAC AAG ACC CGC AAC CAG CAC AGG ACG AGG GGA TCC TGC CCT 100 8 

Lys Ser His Lys Thr Arg Asn Gin His Arg Thr Arg Gly Ser Cys Pro 
325 330 335 

0 

CGG GCA GAC GGG CCA CGG CGA GAG GTC CTG CCA GAT AAG CTG TAGGGGCTCA 106 0 

Arg Ala Asp Gly Ala Arg Arg Glu Val Leu Pro Asp Lys Leu 
340 345 350 

5 GGCCACCCTC CCTGCCACGT GGAGACGCAG AGGCCGAACC CAAACTGGGG CCACCTCTGT 112 0 

ACCCTCACTT CAGGGCACCT GAGCCCCTCA GCAGGAGCTG GGGTGGCCCC TGAGCTCCAA 1180 

CGGCCATAAC AGCTCTGACT CCCACGTGAG GCCACCTTTG GGTGCACCCC AGTGGGTGTG 124 0 

0 

TGTGTGTGTG TGAGGGTTGG TTGAGTTGCC TAGAACCCCT GCCAGGGCTG GGGGTGAGAA 13 00 

GGGGAGTCAT TACTCCCCAT TACCTAGGGC CCCTCCAAAA GAGTCCTTTT AAATAAATGA 13 60 

5 GCTATTTAGG TGCAAAAAAA AAAAAAAAAA A 13 91 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 50 amino acids 
5 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Thr Leu Asn Gly Arg Arg Leu Pro Pro Glu Leu Ser Arg Val Leu Asn 
1 5 10 15 

15 Ala Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin 

20 25 30 

Arg Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu 
35 40 45 

20 

Ala Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Val Asn 
50 55 60 

He ser Cys Trp Ser Lys Asn Met Lys Asp Leu Thr Cys Arg Trp Thr 
25 65 70 75 80 

Pro Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys 
85 90 95 

30 Tyr Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His 

100 105 110 

Thr Val Gly Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe 
115 120 125 



35 



Thr Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala 
130 135 140 
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Arg Ser Asp Val Leu Thr Leu Asp He Leu Asp Val Val Thr Thr Asp 
145 150 155 160 

Pro Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin 
5 165 170 175 

Leu Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe 
180 185 190 

10 Gin Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp 

195 200 205 

Lys Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly 
210 215 220 

15 

Leu Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe 
225 230 235 240 

Gly He Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His 
20 245 250 255 

Pro Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly 
260 265 270 

2 5 Gly Ala Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg 

275 280 285 

Arg Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys 
290 295 300 

30 

Ser Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin 
305 310 315 320 

Lys Ser His Lys Thr Arg Asn Gin His Arg Thr Arg Gly Ser Cys Pro 
35 325 330 335 
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Arg Ala Asp Gly Ala Arg Arg Glu Val Leu Pro Asp Lys Leu 
340 345 350 



5 (2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 



TCCAGGCAGC GGTCGGGGGA CAAC 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

2 5 (A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 0 (ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
3 5 TTGCTCACAT CGTCCACCAC CTTC 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6663 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

15 CCCAGAACTC TTGGACGCTG AGGCAGGAGG ATTCCCAAGT TTCAAGACAG TGTGTTTCTA 6 0 

GGTAATGAGA CCCTGTCAAG AAAAGAAAAG AAATAAAGAG ACAAGAAAAT GTTTATAGGC 12 0 

TGTGAGACAG CTTGGTGGGT AAGGGGCACT TGCCTCCAAT CAAGATGACC TCAGCCCCAT 18 0 

20 

CCCTAGGAAT CCATGGTAGA AGGAGAAAGC AAACTCGCAG CTGCTGACCT CCATACATGT 24 0 

GCTCCAATGT GCACACACAC AGGGAGACAT AATCAATTAA TAGGATGTAT TTGCTTAGAT 30 0 

2 5 TTGAGTAGGC ATTTATGACT GATGTTTTAA AATTTTTATT TGATTTTATG AAAATATACC 36 0 

TGTTTGTATT TGGTTTGGTT TGGTTTGAGT TTTGTTTATT TGAGACAGGG CTTCTCTGTG 420 

TAGTCCTGGC TGTCCTTGGA ACTCACTCTG TAGACCAGGC TGGCCTTGAA CTCAGAAATC 4 80 

30 

CGCCTGCTTG TGCTTCCCAA GTGCTTAGAT TAAAGGTGTG CACTGCCATT CAGCAAAATT 54 0 

GCATACTTTA ACCCCAGTAT TTGGGAGGCA GAGGCAGACT AATGTGTGAA TTCCAGGCTA 600 

3 5 GCCAAGGATA CAGAGTGAGA CCCTATTCTT ACCCTCCCCC CCCAAAACCC CAAAATGTAT 660 

TTTGTGCTTG TGTATGTACA TGTGTGTTGC AGCACGTAAA TGTCCAAGGA CAACTTGTAG 720 
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AAGTTCTCTC CGTTCACAGT CTAAGTCCTG AATTCAAACT AAGGTCCTCA GGCTTAGCCA 7 80 

CAGTCTTCTT TATGTACTGA GCCATTTCAC TGGCCCTGGA TTGACTGATG AATTAATTTT 84 0 

5 TGAGATAAGG TCTCTTGTAG CTCTAGCTAG GCTCAAACTA TGAACTCCCA AGGTCATCTT 9 00 

GAGCTGCTGG TACTCTTGCT TCCACCCCAA GTGGTGG AAT GATACTCAGG CAGCACTTCT 9 60 

CTGGGGAAGG GGCTGGCCTT GGCCTTGATT TTGTTGCCTC AGCTTCAATG AGTGCTTGGG 102 0 

TCTCGTTGTT TCTTTTCTTT ATCTGTGAAA TGGGTGAACA CCTGTTCAAG ACTTCCTGAC 10 80 

TCTTGAAACA TCCAGGCAGG GTGAGGGACT TGAAGTGGGC TCATCCCATG CCTAACAAAG 114 0 

15 TGTCGTCTTT GACCCCAGAC ACAGCTGTAA TCAGCCCCCA GGACCCCACC CTTCTCATCG 12 00 

GCTCCTCCCT GCAAGCTACC TGCTCTATAC ATGGAGACAC ACCTGGGGCC ACCGCTGAGG 1260 

GGCTCTACTG GACCTTCAAT GGTCGCCGCC TGCCCTCTGA GCTGTCCCGC CTCCTTAACA 13 2 0 

20 

CCTCCACCCT GGCCCTGGCC CTGGCTAACC TTAATGGGTC CAGGCAGCAG TCAGGAGACA 13 80 

ATCTGGTGTG TCACGCCCGA GACGGCAGCA TTCTGGCTGG CTCCTGCCTC TATGTTGGCT 14 4 0 

2 5 GTAAGTGGGG CCCCAGACAC TCAGAGATAG ATGGGGGTTG GCAATGACAG ATTTAGAGCC 15 00 

TGGGTCTTCT GTCCTGGGGC AGAGCCATGG GCTCTCACTT GCATGCAGGC ATGGTCATAC 156 0 

CCAGCACAGG CATTGCAACT CTAGGGACAG CTGTGGCTGC ACTGTCCCCT GTGTACCCCA 162 0 

30 

CAGCTTTAGA AAAGCTGTCA TGTTTTCCTT GTAGTGCCCC CTGAGAAGCC CTTTAACATC 1680 

AGCTGCTGGT CCCGGAACAT GAAGGATCTC ACGTGCCGCT GGACACCGGG TGCACACGGG 174 0 

3 5 GAGACATTCT TACATACCAA CTACTCCCTC AAGTACAAGC TGAGGTTGGT ACCCAGCCAA 1800 

GCCTTGCTGT GTGACTTCTG GCAATACTTA CCTTCTCTGA TCAAATATGT TCCTGTTTAT I860 
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GAACTCAAAA GGGACTCTCG CACCTCCACA GGTGGTACGG TCAGGATAAC ACATGTGAGG 192 0 

AGTACCACAC TGTGGGCCCT CACTCATGCC ATATCCCCAA GGACCTGGCC CTCTTCACTC 198 0 

5 CCTATGAGAT CTGGGTGGAA GCCACCAATC GCCTAGGCTC AGCAAGATCT GATGTCCTCA 2 04 0 

CACTGGATGT CCTGGACGTG GGTGAGCCCC CAGTGTCCAC CTGTGTTCTG CCCTAGACCT 2 100 

TATAGGGCGC CTCCCCCCCA TCCCCCCAGA CTTTTTGGTT CTTCTAGAGG TCTTAGCCAC 216 0 

AGCCACGGTG GTTGCAGGAC AGTGGTTGTT CATAACTTAA TGCAAAGACT TTCCCCCAAG 2 22 0 

ACAGTCAAGA TTTTTCCCCT CCCCACCCCC AACACACACA TACACACACA CTCTGCAGAG 22 80 

15 AACACCTGGC CTGACCACCC TCCCTCTCTA CAGCCCAGGT GTTCAGAAGG GAGTCCTAGG 2 34 0 

GGACTGAGAG GAGGCGCCCA GGTCTGAAGG CGCCCCAGGA AGCCGAGGCC TTGAGCTGGG 24 00 

GGGGGGGGCG AGGGTTGGAG GCACGAACTG GATGATCCCT GAGCACAACT GGGCCTAATC 24 6 0 

20 

TAATTAGGGT GTTCCCAGCC CAAAGCAGCC TGGGCCATTT AACCCTTCAA GTGCCTCACT 252 0 

GAAGACTCAG GGGAGAGATC AGCTTGTACT CTCTCCATGG TCCCCCAGGA GGGTTCCTGG 25 80 

2 5 GTGCCCCTGG CTCATTCCCA CATCCAGAGG TTTTGTGTCT TCCTGGCATC TAACCCTCAG 2 64 0 

TTGTGCTCTG TGGCTGGCAC AGCTGCCCCG TGGAGGCTCT TGGTAATGTA CAAGGCATCA 27 00 

GAGGTGGACA TGGGATGGGG ATACATAGGG ATGGAGCCAA ATAGCACCTC AAGGTGGGGT 2760 

30 

GATATACAAT AAAGCTTGTC ACCCTGACGC TCAGAAAGCC TACTCATGAT GATCACAATT 2 82 0 

GTTGACATCA CTCTGGGACA TGTAGTGAGA CCCTAGCTCA AAACACAGAC AGTAGCTTTA 2 88 0 

3 5 AGAGTCAGCT TGTGACTTAA TACTGGAACT CAGGGCCTAA TAGGTGCTGG GTGATGCTCG 2 94 0 

CCTCACTCCC TGTTTAGTGA GATCTCTGCG CTAATCTCCA CCCCAGCTGG GTGGGCTGCT 3 000 
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CTGTCCCCTT GAGGGCAGGA ATGTGTGTCT TCCATCAGAG ATAGGACCCG TGGTAGCAGC 3 06 0 

AACTGCTGCT GGCTGTTTCT GGAATATTAA ATGACAGTAA TCTATCAGGC CTGGGTGAGT 3X2 0 

5 AGCTAACAGG GGTGGGGGCG TGGTCTGGAA AACGCAGATA GGGTCATAGG AGCCACTGCA 318 0 

GCCTAGATTA CACCACTGGG TGTTCTGTCA CTAGGCCATT CTCACCAAGC AGTCCTCAGA 3 24 0 

ACTGGGAGCA CTGTTGCCAG CATTTAATGC CAGCATTTAA TGCCAGCATT AGGGGAGGCA 3 3 00 

GAGGCAGAAG GATCTCTCTG AGTTCAAGGC CATCCTGAAT TTACATAAAG AGCTCCAGGC 3 3 60 

CAGCCAGGGT GCGCAGTAAA ACCTTGTCTC AAAAAACAAA GCATCTTTAG TGACCAGGCT 3 42 0 

15 TGCTCCACCC CCAGTGACCA CGGACCCCCC ACCCGACGTG CACGTGAGCC GCGTTGGGGG 34 8 0 

CCTGGAGGAC CAGCTGAGTG TGCGCTGGGT CTCACCACCA GCTCTCAAGG ATTTCCTCTT 3 54 0 

CCAAGCCAAG TACCAGATCC GCTACCGCGT GGAGGACAGC GTGGACTGGA AGGTGCCCGT 3 60 0 

20 

CCCGCCCCGG ACCCGCCCCT GACCCCGCCC CCCGCATCTG ACTCCTCCCT CACCGTGCAG 3 660 

GTGGTGGATG ACGTCAGCAA CCAGACCTCC TGCCGTCTCG CGGGCCTGAA GCCCGGCACC 3720 

2 5 GTTTACTTCG TCCAAGTGCG TTGTAACCCA TTCGGGATCT ATGGGTCGAA AAAGGCGGGA: 3780 

ATCTGGAGCG AGTGGAGCCA CCCCACCGCT GCCTCCACCC CTCGAAGTGG TGAGCACCTC 3 84 0 

TCCAGGGCTG GCTGGCCCAT GGAATCCCCA ATCCATCCTG TTCCTTCCCC CCCACCCTTT 3 900 

30 

TTTTGAGACA GCGTCTTCAG GTAGCGCATG CTGGCCTTAA ATTCAGTATG TAGTCAAGGA 3 96 0 

TGACCTCGAG CTCCTGGTCT TTTTGTCTCC ACTTAGAGAC AATGGCCAGT GGCCATCACC 4 02 0 

3 5 ACCTTTGGGA GACTAGCCAT GGAGTCTATT TAGCGTGTCA TTTGGTGACA GATGGAGTAC 4 080 

AACAGTGTGA CCTCTTGTAA GAGAACTGAA GACAGGCTGT TTTTAACCCC AATATCCTAG 414 0 
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GCTCTCTAGA GGTTAACTTT ATATAAAATA GAGACTATTA CAGCCAGTTA TCACATGGTC 4 2 CO 

CCACAGAACC TTTTGTCACA CAACCTATAG ACCACAGTGC CTGTGCCTAC CACATAAGGG 4260 

5 TCTCTACTGC TGGCCCACCC CTCCAACCCT TAAAAGGTAA CCTAGGCAGC CTTAATATTT 4 3 20 

GCAATCCTCC TACCTCAGCC TCTTGAATGC TCAGAAACCA GGCATTAACC CAAGTTTCTC 4 380 

TTCTCTGGGT CCCTTTCTTA AGGTGGGAGG GCCTAAAGAT GACTTCCTTT GTCCTGAAGA 4 44 0 

CTCTCCGAGC CCATGGATCT GCACTCTCTA ATATGAAATA TATTGCATAA AATGTCTGGC 4 500 

CTCAGTTTCC CCACCTGTCA GGTTTAGGCA GCACAGTCGG TCCAAGACAC TTCATTATTT 4 560 

15 GCAGGCAGTA TAAGAAGAAG CTCCCATCCC CCACCCGCTT CCTCCGGTCC CTAAGACAGA 4 62 0 

ATACTTCTAC ACTGAAACTG AACTCTCGCA GACGCATATG CTCACTTTAA TGATGATGAA 4 68 0 

ATAATGGGGA AACTGAGGCT CCGAGAGATT CCTGGAGGAA GAGGGTCAAA ACCAGCTCCA 4 74 0 

20 

GGAAGCTCTC CAGCCCCCAT CCGGGCCTCT CCAGGTTCTG GGCTTGGCGG GAGTGAACAC 4 800 

AGCTGGGAGG GGCTGGAGCC TGGGAGCTTT GGCCCTTGCT CGTGCCCAGC ACCTGCGATT 4 860 

2 5 CTTGCACGGG AGCCAGCAGG CGGCTGCGTC CGCCCGAGAG ACTGAAGAAG CCGGGGGTAG 4 92 0 

GGTTGGAGGG AGGTAAGCAG GGGCTGTGGG GGCCGAAGCT TGTGCCAGGG CCTGTCAGCG 4 980 

AGTCCCCAGT TTTATTTATG GCGTGAGGCC GATGTCCTTA TCCGCTGGCC TGCTGGGGGA 504 0 

30 

TGGCTGCGGC TGGGGATTGG ACCCAAGGGC TGGCTTCCCA CTCAGTCCTC CAGCCCACTC 5100 
CATGTCACAC CCGTGCATTC TCTGAGGCTT ATCTTGGGAA CCCGCCCTTG TTCTGTGCTG 5160 
3 5 TCTGTCTCTA TTTCTGTCAT TCACTTTCCC AGAGCCTTTT TTTTATGCTT TTAATATAAC 5220 

TACGITTTAA AAATTGCTTT TGTATAATGT GTGTGCCTTC GTGAGCGTGC GTGCCACAAC 52 80 
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ACACACGTGA AGGTTAGAGA ACTTTGTTGA GTAGGCTCCT TCCACCATGT GGGACTAGGG 5 34 0 

CTGGCGACAA GAGCAATTAC TGAGTCATCT CGCCAGCCCC TCACCCCTCA CTTCCCATCC 54 00 

5 TGTTTGGATA GTCATAGGTA ATCGAAGGTA AATCGCTGGC TTTAATTTCG TAGCTATCCT 54 60 

GCCTCAGCCT ACCAAGTGCT GTGCTACCAC GTTTGTGGGA GGGGCTCTCC TCCCAGTGTC 5 52 0 

TGGGGGTGAC ACAGTCCCAA GATCTCTGCT TTCTAGGTCT TTGTCTTAGT TTGCCCCTTG 5 5 80 

CTTTGTCCGT GTCCCTAGAG TCTCCGGCCC CACTTATCCA TTGACTGGTC TTTCCTTTAC 5 64 0 

CGAATACTCG GTTTTACCTC CCACTGATTT GACTCCCTCC TTTGCTTGTC TCCATCGCCG 5 70 0 

15 TGGCATTGCC ATTCCTCTGG GTGACTCTGG GTCCACACCT GACACCTTTC CCAACTTTCC 57 6 0 

CCAGCCGAAG CTGGTCTGGT ATGGGAGGCC GCCGTCCCGC GCGCGCCTCC TGCTGGCCGC 582 0 

GCCCCAACAC TGCCGCTCCA TTCTCTTTAG AGCGCCCGGG CCCGGGCGGC GGGGTGTGCG 588 0 

20 

AGCCGCGGGG CGGCGAGCCC AGCTCGGGCC CGGTGCGGCG CGAGCTCAAG CAGTTCCTCG 594 0 

GCTGGCTCAA GAAGCACGCA TACTGCTCGA ACCTTAGTTT CCGCCTGTAC GACCAGTGGC 6000 

25 GTGCTTGGAT GCAGAAGTCA CACAAGACCC GAAACCAGGT AGGAAAGTTG GGGGAGGCTT 6060 

GCGTGGGGGG TAAAGGAGCA GAGGAAGAGA GAGACCCGGG TGAGCAGCCT CCACAACACC 6120 

GCACTCTTCT TTCCAAGCAC AGGACGAGGG GATCCTGCCC TCGGGCAGAC GGGGTGCGGC 6180 

30 

GAGAGGTAAG GGGGTCTGGG TGAGTGGGGC CTACAGCAGT CTAGATGAGG CCCTTTCCCC 6240 

TCCTTCGGTG TTGCTCAAAG GGATCTCTTA GTGCTCATTT CACCCACTGC AAAGAGCCCC 63 00 

35 AGGTTTTACT GCATCATCAA GTTGCTGAAG GGTCCAGGCT TAATGTGGCC TCTTTTCTGC 63 60 

CCTCAGGTCC TGCCGGCTAA ACTCTAAGGA TAGGCCATCC TCCTGCTGGG TCAGACCTGG 6420 
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AGGCTCACCT GAATTGGAGC CCCTCTGTAC CATCTGGGCA ACAAAGAAAC CTACCAGAGG 64 8 0 

CTGGGCACAA TGAGCTCCCA CAACCACAGC TTTGGTCCAC ATGATGGTCA CACTTGGATA 6 54 0 

5 TACCCCAGTG TGGGTAGGGT TGGGGTATTG CAGCGCCTCC CAAGAGTCTC TTTAAATAAA 6600 

TAAAGGAGTT GTTCAGGTCC CGATGGCCAG TGTGTTTGGG GCCTATGTGC TGGGGTGGGG 6660 

GGA 6663 



10 



15 



20 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 6 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

2 5 Asp Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser lie 

15 10 15 

His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Phe 
20 25 30 

30 

Aen Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser 
35 40 45 

Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser 
35 50 55 60 
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Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala Gly 
65 70 75 80 



Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie Ser 
5 85 90 95 

Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly 
100 105 110 

10 Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys 

115 120 125 

Leu Arg Leu Val Arg Ser Gly * His Met * Gly Val Pro His Cys 
130 135 140 

15 

Gly Pro Ser Leu Met Pro Tyr Pro Gin Gly Pro Gly Pro Leu His Ser 
145 150 155 160 

Leu ♦ Asp Leu Gly Gly Ser His Gin Ser Pro Arg Leu Ser Lys He 
20 165 170 175 

* Cys Pro His Thr Gly Cys Pro Gly Arg 
180 185 



25 



(2) INFORMATION FOR SEQ ID NO: 30 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 5 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



35 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
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AGCTGGCGCG CCTCCCGGGC GGATCGGGAG CCCAC 3 5 



5 (2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 1 



2 0 AGCTACGCGT TTAGAGTTTA GCCGGCAG 2 8 



(2) INFORMATION FOR SEQ ID NO: 32: 

2 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

Met Val Leu Ala Ser Ser Thr Thr Ser lie His Thr Met Leu Leu Leu 
1 5 10 15 
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Leu Leu Met Leu Phe His Leu Gly Leu Gin Ala Ser He Ser 
20 25 30 



(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 amino acids 
10 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



15 



20 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 



He Lye Pro Ser Gly Arg Arg Gly Ala Ala Arg Gly Pro Ala Gly Asp Tyr Lys Asp Asp 
S 10 15 20 

Asp Asp Lys 



(2) INFORMATION FOR SEQ ID NO: 34 



(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

35 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 



GATCTTGCCC TCGGGCAGAC GGGGTGCGGC GAGAGGTCCT GCCGGCGACT ACAAGGACGA 
CGATGACAAG TAG 



10 (2) INFORMATION FOR SEQ ID NO: 35: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



25 

AACGGGAGCC CGTCTGCCCC ACGCCGCTCT CCAGGACGGC CGCTGATGTT CCTGCTGCTA 
CTGTTCATCC TAG 

30 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

CCCACGCTTC TCATCGGATT CTCCCTG 2 7 

10 (2) INFORMATION FOR SEQ ID NO : 3 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE : nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



25 

CAGTCCACAC TGTCCTCCAC TCGGTAG 



30 (2) INFORMATION FOR SEQ ID N0:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

- 119 - 
SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <WO 9811225A2J_> 



[) 98/1 1225 PCT/GB97/02479 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

GCGGCCGCTG CAGTGATTAC TCACCGCGTG GCGCACCCCA CCCGCGGGCC GCTGAGTGGA 
TTTTTCCGTG GGGGGATGTG AAGAAGTTTA GGGAGAACTC TTCTGCACCG ATGGGAACTA 
GGAATGCAGG GTTCGGTCCC GTTCCCCAAA GGACACACCT CTCCCCATAA GCCCACTCAT 
AAGGGCTCCC TGCACGCGCT CCGGGACATC CCCATATCCA ATACCCGCAG ATATGATAGT 
TGAGAAGGGA CCAGAGGCCG GAGACTCCCT CCCTGCCTTC TGGCTTTCCC CCCCCCCTGC 
ACGAAACGAG ACTACAGCGA TGGGAGAGGT GGCATGAAGG CTTAGGGTGG GGATCGGTAG 
GACCCATGCA CCCAGAGAAA GGGACTGGTG GCAACTTTCA AACTCTCTGG GGAAGGAAGA 
AGGGCTGAAA GAGGATGAAC GGGCTCAGGT ACTGCTCAAT GTGTGTGTGG CGGACCAAAG 
TGGGTATGGG GGCCCCGTAA GAGGGGCGGG GAAGGTGGAT AGGAAGGATC CCGGTAGACT 
GGAGGGGATC CTGGAAAAGC ACCAGGGCTG CGAGCTAGGA ACCCATTCGG AGTTAAGGGT 
ACAGGATCCC AGATGAGGGG GTGGGAAGCC TGGGACGGGC GGGACCAGAG AGGGAGGTCC 
CACGGGCTGG TGGGGAAAGA GTGGGGGGCT TCGCGCAGGA GGATGGGACG TTCAGGAGTG 
GTAACTGGGC GGAGGCCGGC CGGGCGGGGC GCGCGGTGCC CGCGGGCGGT GGGAAGGCCG 
GTGCGGGGCC CACGATCAAC CCCCCCCCAG GGGCCGGGCC GGGCCGGGGG CGGGGCCGGG 
CGGGGCGAGC GGCGCATTAG CGCCTTGTCA ATTTCGGCTG CTCAGACTTG CTCCGGCCTT 
CGCTGTCCGC GCCCAGTGAC GCGCGTGAGG ACCCGAGCCC CAATCTGCAC CCCGCAGACT 
CGCCCCCGCC CCATACCGGC GTTGCAGTCA CCGCCCGTTG CGCGCCACCC CCATGCCCGC 
GGGTCGCCCG GGCCCCGTCG CCCAATCCGC GCGGCGGCCG CCGCGGCCGC TGTCCTCGCT 
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GTGGTCGCCT 

GTACCGTGCG 
5 AGTCGCGGGG 
GGCGGCCCrC 
AGTACCCCGT 

10 

AGGCTCAGTT 
GCTTCGGGGC 
15 GGGCGCACGC 
GAAGTGATGA 
ATGCGGCCCG 

20 

CTATAGCAGA 
CGGTCTCATT 

2 5 CGAGAGCAAG 

GGGGGTCAGC 
ATCACCCAAC 

30 

CACACCCAAA 
CGCGCGCTGC 

3 5 ACACACACAC 

ACACGCACGC 

BNSDOCID: <WO 981 1225A2J_> 



CTGTTGCTCT GTGTCCTCGG GGTGCCTCGG 
CCCTGCTCCC CACCTCCCCA GGGAAGCCGG 
GATGGAAGAA GGGGCGCGAG CGCCACCTGG 
GGGGCGCCCT CACCTGTGGG GCTCATGGCA 
TATACATCAG AGGCCTCTTA TCTGTATCCC 
TGAAGGACAT CGCAGTGTCC TGGGACCCCC 
GCACGCCTGT GTCTTGGATA TCAGAGCGGA 
TTGGGTGCGT TGGGTTGGGT GCTGGCGCAA 
TCCCCGGGGG GAGGGTGGGG CGTTATCGTG 
GCGTCCCTCG GGACTTGCCT CTCCGTGGGG 
CTCCATGCTT TGGTATCCTC GAAGTCCTCT 
CAGGCTGCGC TGGGTTGAGA GCCTCTAGCG 
CGTGTCCGGG CACCGCGAGC CCAGACTTCA 
TGCCGAGAGA ATCCCACTGT CCCAGGAGGA 
GCACACATCC CCGCCAGGAT GCGGTCTCCA 
GACACACAAA AGAGCCCCAC TGGCTTATGT 
AGCCCAGATG CGTATTCGCA CACCATCGCG 
ACACACACAC ACACACACAC ACACACACAC 
ACACACACGC ACGCCCGCAC TCGTGGTCCC 
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GGCGGATCGG GAGCCCGTGA 114 0 

GATCCGGCGC CCCGGGGGGT 12 00 

ACGTCCCGGG AACAAAGGAA 12 6 0 

CCACCACCCA GCCTCCCAAG 132 0 

CTTTGCGAGG CTGTCTGGCC 13 8 0 

CTCCTTCAGG GTGCTGGGAC 144 0 

AGGGAAGCCT CCCTGGCCGG 1500 

AGTGGGGTCC CCTCCCCCAT 156 0 

AGCCCTCCTG TCCGCCTGGC 162 0 

TCGGCGCCGC CCCCTCCCCC 1680 

CCACTGGTGG GGCTCACAAC 174 0 

ACTGAAATTT CGGTGAGGAG 1800 

TTGTCTAAGG GGCACCCAGT 18 60 

ACTCCTGGCC TTGAGCCCCC 1920 

CATCCAGACC CTCTCTGGGA 1980 

CCCGTCACCC TGCCCTCCGA 2 04 0 

GCGCTCGCAT TCCATCCTCT 2100 

ACACACACAC ACGCACACAC 2160 

ACATTTATTT CACAGGGGAG 2220 
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GCAACACCGG GGTACGCATA TGGTTGAGTG CACTGGAGAT CTTTCCCCAC CACTCTCAGG 22 80 

ACCCCATCCG GAGACACAGG CCACACCGCA GGGGCACCAC GCTGCGCTGC TGCTCTGGGC 2 34 0 

5 TAGTAGTCTT GTGCAGTTTG TCCGCGGTGT CTGTGGACGC CCTCCCGCTC TTGTCAGGGG 24 00 

ACAGGAACCT ACACTCCTGC TTGCCCAAGG CGGCTGGGCA GGTGATGTGG TGACACCCGG 24 6 0 

GACCTTTCCG GGGAGTTGGT GTTGCTGCCA AGCCTGGGTA GTTTTTGAAT GCCACCAATA 252 0 

GCGCTAAGCT TTGTTTCCGG GCGGGCTGCA GAGCAACAGG CGAAGGTGGC GGAGTGGGGG 2 5 80 

TGGCGCGTGT GTTTTTTCTT TTAAGGGGGA GAGAAATTAA ATAAGAGGTT CTCACACCTC 264 0 

15 TGCAATCTGT TTGTACTTAC CGTGTGTCTT AACACCTGAC CAGCCAGCCG GTGGGTCGTA 27 00 

AAAGTGTATG CAGGTACCAG CGGGACAGGA GATGGGGGCC CCTGGGGTAT GGCTGGGATG 2760 

GAGGCCACCT TCCCGTTGGC CTTTCAGGGA ATCTCACACT TTTCCCTTTT AAAACACATG 2 820 

20 

GTGTTCTTTT TAATAACGGC AGCAACTCCG CATTGGGAAA GGGGGAAATA AGCTTGTATA 2 880 

GGCCCCGGCT TTGTGGAAAG GAGGGGAAGA GGGAAGAAAA AAGGAGGGGT GTCTCCTCCA 2 94 0 

25 GGCTTAGGGG GCTGTCAGCT GCTGCTCTGT CTAGCTTGGC ATGTGTGTGC CCCAGTCCCC 3000 

AGTGGCTTTG GCCCATTGTT TGTGGAAGCC AAGAGGGAGA CTGGAGTCCT CTATCTCTGG 3 060 

TACTCCAGAG TCAGGCTTCT CAGTCCGAGC CCAGAGAACG TCTTCCCTGT TTTATGGAGG 3120 

30 

GAATCAGGGA AGGGGGTGCC AGGTGGACTA CGTTCTGCTG AGGACTGTAC CAGTCGCTCG 3180 

AAGGAGAAAG CTTGGGCTTG CCCCCCTCCC CCCTCAAGCC ACGAAGGGCA GCTGCTAGGC 3 24 0 

3 5 TAGTGTGGTA AAAGGGCATT ACTCCCCAGC CAGGACCCCC CAGAGAGTCC CCTTCCTGGC 3 30 0 

CAGACAAATG CTGGGGAGGG ACAGAGGGGT GTGATCATTG CCCAGGAGTG CAGACAGTGG 3 3 60 
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GGTCCCGGGT CGGGCAGTGC CTCCCACCCT GCTGAGGGGG GCGCCCAGGC AGGAAGCGGT 342 0 

GGGTGGGCCG GGGTAGAGAC GCTGGCACGT CCCAGTTCAT GCCGAAGGAA TTCTGAATTA 3 4 80 

5 GCGGGCGGCT GGCTGCCTGG GACCTCCGGG GCGGCCCCCT GGCCCCCGCC GCTCCGTCTG 3 54 0 

GCCTGCTCCT CCTGCTCCTT CGCACGGACG CTGAGACCTC CGCTGAGCCC TGGGACAAGC 3 60 0 

CCCAAATGCA ACTGCGATTG CAGGCTTCGC AAGACCCGCC TCCTCCCAAG GCCAAATTTG 3 66 0 

CCTGGGAGAA GTCATTCAGG GCCCAGACTA GAACCATGTT GGTGCCACCT CATCCATCTG 3 72 0 

GGGCATGAAG GACCGTCCAG GGCTGCAGTT TAGCTTCTTA ATAGGAACCT GGGGGTGGGT 3 78 0 

15 GCAGCCTCTG TTCTCCGAGC CTCTTTGGAA ATCGGTTTTG TTTTTGTTTT TGTTTTTTCC 3 84 0 

AATACTCTTT TCCTCTCATC CCATCCCGGG ACTGTTTTCC TCCCTAAGGG TTGAGAGCCC 3 900 

TGCAGTCTTC CCTAACCTTT TCTTTGCTTC TACCCCAGGG CCTTTGCACA TGGAGTCCCA 3 960 

20 

CCTCTCCCCT TGCCCAACTG GGGCTCCAGC CTTACTGCAT TTGGCTCTTG GTAACTGTCC .4020 

CAGGGCCTCT CTGACACACA GGGTTGTAGC CCCAGCTCCC TCTCTTCTCC TCCCCCCTTT 4 080 

2 5 CTCTTTTGCT TCTGAGACTT AATTTTTTTC TTTTTCTTTT TGGCTTTTTG AGACAGGGTT 4 14 0 

TCTCTGTACA GCCCTGGCTG CCCTGGCACT CATTCTGTAG ACCAGGCTAG CCTCAAACTC 42 00 

ACAAACCTAC CTGCCTCTGC CTTTCCAGTG CTGGCACTAA AGATGTGGGC CACCACAACT 4260 

30 

AGTAGTTAAG TGTTTTGCTG TGTCTTTATT CCTATAGTGA CCTCAGTTCC TGGCATATTG 4 320 

TAGGCGATGG ATGGATGAAT GGATGGATGG ATGGATGGAT GGATGGTTGG ATGGAGCAAG 4 3 80 

3 5 CTTGAATCGT CCTGAGTGAA AAAAGAGACC TCAGAGAACT GAATGGAGTT AGGTTCCCAG 4 44 0 

GGCAGCCTGG CCTGCTGGTC TCATGGGAGC TCCCTGTGAA ACTTCCCCCA CACCTCCCAC 4 500 
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CACCCTGCCA TCCTGTGTGG CTGACAAGAA AGGCCAATGG CCAGATGGGG ACACAGACTC 4 56C 

AGGGAAGCTT GGAATATGTT CCCCTCCTCA TATCCTAGGC CTTGTTGTCC CCCTGAGGGC 4 62 0 

5 CCAGCCTATG AGTAGGGCAG CTGTGGGCTG CCCTAAGGTT GGGTAGGCAA GAAGGGGGTG 4 680 

GTCCCTCAGG GTGGGTCACA GGATTGAGGT CATTTCCAAA GTGGCCATCA CAGTGGCCCT 4 74 0 

AGGAAATGAT TGTGGAGAGT CAGAACTCCT GTTGGGAGTT GTAGAGGGCC TTGCATGTGG 4 800 

GCTTCTGTGG CTGTCCCTTC TCTTGTGGTC CTTTGCACAG TCCCCTCGTG TGTGCTGGGA 4 860 

TGTGAGGAGG GCACGGGGAA AATGAAGGCT CAGCCCCTCA GCTTGCCCTT CACGGTTCAC 4 92 0 

15 CCAACAGGGC TCACCTCTCC TCTGGACAGG CTCTCACTGT ATGCACAGAT TGGCCTCACA 4 9 80 

TTTGATTCCC TTCCTTTGGT CTCCTGGGAT GACAAACATT TACCAGGGTA GGATTTTACA 5 04 0 

TTTTAGATAT GTCCATTCTC CAGAAACACA CTTGTGAGGT TAGGGTATCA GTGAAAGGAC 510 0 

20 

ACCACCAGGA CAGACAAAGA ATTGGAGAGG AAGGAAATTG GTAAGCCAGG CCATGCTTGA 5160 

TGGCTTATGT GTAATCCCAG AACTCTGGAC GCTGAGGCAG GAGGATTCCA AGTTTCAAGA 5220 
2 5 CAGTGTGTTC TAGGTAATGA GACCCTGTCA AGAAAAGAAA AGAAATAAAG AGACAAGAAA 528 0 

ATGTTTATAG GCTGTGAGAC AGCTTGGTGG GTAAGGGGCA CTTGCCTCCA ATCAAGATGA 534 0 
CCTCAGCCCC ATCCCTAGGA ATCCATGGTA GAAGGAGAAA GCAAACTCCA GCTGCTGACC 5400 

30 

TCCATACATG TGCTCCAATG TGCACACACA CAGGGAGACA TAATCAATTA ATAGGATGTA 546 0 
TTTGCTTAGA TTTGAGTAGG CATTTATGAC TGATGTTTTA AAATTTTTAT TTGATTTTAT 552 0 

35 GAAAATATAC CTGTTTGTAT TTGGTTTGGT TTGGTTTGAG TTTTGTTTAT TTGAGACAGG 55 80 

GCTTCTCTGT GTAGTCCTGG CTGTCCTTGG AACTCACTCT GTAGACCAGG CTGGCCTTGA 564 0 
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ACTCAGAAAT CCGCCTGCTT GTGCTTCCCA AGTGCTTAGA TTAAAGGTGT GCACTGCCAT 5700 

TCAGCAAAAT TGCATACTTT AACCCCAGTA TTTGGGAGGC AGAGGCAGAC TAATGTGTGA 5760 

5 ATTCCAGGCT AGCCAAGGAT ACAGAGTGAG ACCCTATTCT TACCCTCCCC CCCCAAAACC 5 82 0 

CCAAAATGTA TTTTGTGCTT GTGTATGTAC ATGTGTGTTG CAGCACGTAA ATGTCCAAGG 5 88 0 

ACAACTTGTA GAAGTTCTCT CCGTTCACAG TCTAAGTCCT GAATTCAAAC TAAGGTCCTC 5 94 0 

AGGCTTAGCC ACAGTCTTCT TTATGTACTG AGCCATTTCA CTGGCCCTGG ATTGACTGAT 6 000 

GAATTAATTT TTGAGATAAG GTCTCTTGTA GCTCTAGCTA GGCTCAAACT ATGAACTCCC 6 060 

15 AAGGTCATCT TGAGCTGCTG GTACTCTTGC TTCCACCCCA AGTGGTGGAA TGATACTCAG 612 0 

GCAGCACTTC TCTGGGGAAG GGGCTGGCCT TGGCCTTGAT TTTGTTGCCT CAGCTTCAAT 6180 

GAGTGCTTGG GTCTCGTTGT TTCTTTTCTT TATCTGTGAA ATGGGTGAAC ACCTGTTCAA 624 0 

20 

GACTTCCTGA CTCTTGAAAC ATCCAGGCAG GGTGAGGGAC TTGAAGTGGG CTCATCCCAT 6300 

GCCTAACAAA GTGTCGTCTT TGACCCCAGA CACAGCTGTA ATCAGCCCCC AGGACCCCAC 6 360 

2 5 CCTTCTCATC GGCTCCTCCC TGCAAGCTAC CTGCTCTATA CATGGAGACA CACCTGGGGC . 64 20 

CACCGCTGAG GGGCTCTACT GGACCTTCAA TGGTCGCCGC CTGCCCTCTG AGCTGTCCCG 64 8 0 

CCTCCTTAAC ACCTCCACCC TGGCCCTGGC CCTGGCTAAC CTTAATGGGT CCAGGCAGCA 654 0 

30 

GTCAGGAGAC AATCTGGTGT GTCACGCCCG AGACGGCAGC ATTCTGGCTG GCTCCTGCCT 660 0 

CTATGTTGGC TGTAAGTGGG GCCCCAGACA CTCAGAGATA GATGGGGGTT GGCAATGACA 6 66 0 

35 GATTTAGAGC CTGGGTCTTC TGTCCTGGGG CAGAGCCATG GGCTCTCACT TGCATGCAGG 672 0 

CATGGTCATA CCCAGCACAG GCATTGCAAC TCTAGGGACA GCTGTGGCTG CACTGTCCCC 6 78 0 
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TGTGTACCCC ACAGCTTTAG AAAAGCTGTC ATGTTTTCCT TGTAGTGCCC CCTGAGAAGC 6 84 0 

CCTTTAACAT CAGCTGCTGG TCCCGGAACA TGAAGGATCT CACGTGCCGC TGGACACCGG 6 900 

5 GTGCACACGG GGAGACATTC TTACATACCA ACTACTCCCT CAAGTACAAG CTGAGGTTGG 6 96 0 

TACCCAGCCA AGCCTTGCTG TGTGACTTCT GGCAATACTT ACCTTCTCTG ATCAAATATG 7 02 0 

TTCCTCnTA TGAACTCAAA AGGGACTCTC GCACCTCCAC AGGTGGTACG GTCAGGATAA 7 08 0 

CACATGTGAG GAGTACCACA CTGTGGGCCC TCACTCATGC CATATCCCCA AGGACCTGGC 714 0 

CCTCTTCACT CCCTATGAGA TCTGGGTGGA AGCCACCAAT CGCCTAGGCT CAGCAAGATC 7 200 

15 TGATGTCCTC ACACTGGATG TCCTGGACGT GGGTGAGCCC CCAGTGTCCA CCTGTGTTCT 7 2 60 

GCCCTAGACC TTATAGGGCG CCTCCCCCCC ATCCCCCCAG ACTTTTTGGT TCTTCTAGAG 7 32 0 

GTCTTAGCCA CAGCCACGGT GGTTGCAGGA CAGTGGTTGT TCATAACTTA ATGCAAAGAC 7 3 80 

20 

TTTCCCCCAA GACAGTCAAG ATTTTCCCCT CCCCACCCCC AACACACACA TACACACACA 7 44 0 

CTCTGCAGAG AACACCTGGC CTGACCACCC TCCCTCTCTA CAGCCCAGGT GTTCAGAAGG 7 50 0 

25 GAGTCCTAGG GGACTGAGAG GAGGCGCCCA GGTCTGAAGG CGCCCCAGGA AGCCGAGGCC 7 56 0 

TTGAGCTGGG GGGGGGGGCG AGGGTTGGAG GCACGAACTG GATGATCCCT GAGCACAACT 7620 

GGGCCTAATC TAATTAGGGT GTTCCCAGCC CAAAGCAGCC TGGGCCATTT AACCCTTCAA 7 680 

30 

GTGCCTCACT GAAGACTCAG GGGAGAGATC AGCTTGTACT CTCTCCATGG TCCCCCAGGA 7 74 0 

GGGTTCCTGG GTGCCCCTGG CTCATTCCCA CATCCAGAGG TTTTGTGTCT TCCTGGCATC 7 800 

3 5 TAACCCTCAG TTGTGCTCTG TGGCTGGCAC AGCTGCCCCG TGGAGGCTCT TGGTAATGTA 7 8 60 

CAAGGCATCA GAGGTGGACA TGGGATGGGG ATACATAGGG ATGGAGCCAA ATAGCACCTC 7 92 0 

- 126 - 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 981 1225A2J_> 



10 



wo 98/11225 PCT/GB97/02479 

AAGGTGGGGT GATATACAAT AAAGCTTGTC ACCCTGACGC TCAGAAAGCC TACTCATGAT 7 98 0 

GATCACAATT GTTGACATCA CTCTGGGACA TGTAGTGAGA CCCTAGCTCA AAACACAGAC 8 04 0 

5 AGTAGCTTTA AGAGTCAGCT TGTGACTTAA TACTGGAACT CAGGGCCTAA TAGGTGCTGG 8100 

GTGATGCTCG CCTCACTCCC TGTTTAGTGA GATCTCTGCG CTAATCTCCA CCCCAGCTGG 8160 

GTGGGCTGCT CTGTCCCCTT GAGGGCAGGA ATGTGTGTCT TCCATCAGAG ATAGGACCCG 822 0 

TGGTAGCAGC AACTGCTGCT GGCTGTTTCT GGAATATTAA ATGACAGTAA TCTATCAGGC 8280 

CTGGGTGAGT AGCTAACAGG GGTGGGGGCG TGGTCTGGAA AACGCAGATA GGGTCATAGG 8 34 0 

15 AGCCACTGCA GCCTAGATTA CACCACTGGG TGTTCTGTCA CTAGGCCATT CTCACCAAGC 84 0 0 

AGTCCTCAGA ACTGGGAGCA CTGTTGCCAG CATTTAATGC CAGCATTTAA TGCCAGCATT 8 4 60 

AGGGGAGGCA GAGGCAGAAG GATCTCTCTG AGTTCAAGGC CATCCTGAAT TTACATAAAG 8 520 

20 

AGCTCCAGGC CAGCCAGGGT GCGCAGTAAA ACCTTGTCTC AAAAAACAAA GCATCTTTAG 8 580 

TGACCAGGCT TGCTCCACCC CCAGTGACCA CGGACCCCCC ACCCGACGTG CACGTGAGCC 864 0 

2 5 GCGTTGGGGG CCTGGAGGAC CAGCTGAGTG TGCGCTGGGT CTCACCACCA GCTCTCAAGG 87 00 

ATTTCCTCTT CCAAGCCAAG TACCAGATCC GCTACCGCGT. GGAGGACAGC GTGGACTGGA 8760 

AGGTGCCCGT CCCGCCCCGG ACCCGCCCCT GACCCCGCCC CCCGCATCTG ACTCCTCCCT 8820 

30 

CACCGTGCAG GTGGTGGATG ACGTCAGCAA CCAGACCTCC TGCCGTCTCG CGGGCCTGAA 88 80 

GCCCGGCACC GTTTACTTCG TCCAAGTGCG TTGTAACCCA TTCGGGATCT ATGGGTCGAA 8 94 0 

3 5 AAAGGCGGGA ATCTGGAGCG AGTGGAGCCA CCCCACCGCT GCCTCCACCC CTCGAAGTGG 9000 

TGAGCACCTC TCCAGGGCTG GCTGGCCCAT GGAATCCCCA ATCCATCCTG TTCCTTCCCC 9060 

- 127 - 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO_981l225A2J_> 



10 



wo 98/11225 PCT/GB97/02479 

CCCACCCTTT TTTTGAGACA GCGTCTTCAG GTAGCGCATG CTGGCCTTAA ATTCAGTATG 912 0 

TAGTCAAGGA TGACCTCGAG CTCCTGGTCT TTTTGTCTCC ACTTAGAGAC AATGGCCAGT 918 0 

5 GGCCATCACC ACCTTTGGGA GACTAGCCAT GGAGTCTATT TAGCCTGTCA TTTGGTGACA 924 0 

GATGGAGTAC AACAGTGTGA CCTCTTGTAA GAGAACTGAA GACAGGCTGT TTTTAACCCC 93 0 0 

AATATCCTAG GCTCTCTAGA GGTTAACTTT ATATAAAATA GAGACTATTA CAGCCAGTTA 93 6 0 

TCACATGGTC CCACAGAACC TTTTGTCACA CAACCTATAG ACCACAGTGC CTGTGCCTAC 94 2 0 

CACATAAGGG TCTCTACTGC TGGCCCACCC CTCCAACCCT TAAAAGGTAA CCTAGGCAGC 94 8 0 

15 CTTAATATTT GCAATCCTCC TACCTCAGCC TCTTGAATGC TCAGAAACCA GGCATTAACC 954 0 

CAAGTTTCTC TTCTCTGGGT CCCTTTCTTA AGGTGGGAGG GCCTAAAGAT GACTTCCTTT 960 0 

GTCCTGAAGA CTCTCCGAGC CCATGGATCT GCACTCTCTA ATATGAAATA TATTGCATAA 966 0 

20 

AATGTCTGGC CTCAGTTTCC CCACCTGTCA GGTTTAGGCA GCACAGTCGG TCCAAGACAC 972 0 

TTCATTATTT GCAGGCAGTA TAAGAAGAAG CTCCCATCCC CCACCCGCTT CCTCCGGTCC 97 8 0 

25 CTAAGACAGA ATACTTCTAC ACTGAAACTG AACTCTCGCA GACGCATATG CTCACTTTAA 984 0 

TGATGATGAA ATAATGGGGA AACTGAGGCT CCGAGAGATT CCTGGAGGAA GAGGGTCAAA 9900 

ACCAGCTCCA GGAAGCTCTC CAGCCCCCAT CCGGGCCTCT CCAGGTTCTG GGCTTGGCGG 9960 

30 

GAGTGAACAC AGGTGGGAGG GGCTGGAGCC TGGGAGCTTT GGCCCTTGCT CGTGCCCAGC 10 020 

ACCTGCGATT CTTGCACGGG AGCCAGCAGG CGGCTGCGTC CGCCCGAGAG ACTGAAGAAG 10 08 0 

3 5 CCGGGGGTAG GGTTGGAGGG AGGTAAGCAG GGGCTGTGGG GGCCGAAGCT TGTGCCAGGG 1014 0 

CCTGTCAGCG AGTCCCCAGT TTTATTTATG GCGTGAGGCC GATGTCCTTA TCCGCTGGCC 102 0 0 
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TGCTGGGGGA TGGCTGCGGC TGGGGATTGG ACCCAAGGGC TGGCTTCCCA CTCAGTCCTC 10260 

CAGCCCACTC CATGTCACAC CCGTGCATTC TCTGAGGCTT ATCTTGGGAA CCCGCCCTTG 10320 

5 TTCTGTGCTG TCTGTCTCTA TTTCTGTCAT TCACTTTCCC AGAGCCTTTT TTTTATGCTT 10380 

TTAATATAAC TACGTTTTAA AAATTGCTTT TGTATAATGT GTGTGCCTTC GTGAGCGTGC 10440 

GTGCCACAAC ACACACGTGA AGGTTAGAGA ACTTTGTTGA GTAGGCTCCT TCCACCATGT 105 00 

GGGACTAGGG CTGGCGACAA GAGCAATTAC TGAGTCATCT CGCCAGCCCC TCACCCCTCA 10560 

CTTCCCATCC TGTTTGGATA GTCATAGGTA ATCGAAGGTA AATCGCTGGC TTTAATTTCG 10 620 

15 TAGCTATCCT GCCTCAGCCT ACCAAGTGCT GTGCTACCAC GTTTGTGGGA GGGGCTCTCC 10680 

TCCCAGTGTC TGGGGGTACA CAGTCCCAAG ATCTCTGCTT TCTAGGTCTT TGTCTTAGTT 10740 

TGCCCCTTGC TTTGTCCGTG TCCCTAGAGT CTCCGGCCCC ACTTAGTCTC CATTGATTTC 10800 

20 

CTTTCTGACC GAATACTCGG TTTTACCTCC CACTGATTTG ACTCCCTCCT TTGCTTGTCT 10860 

CCATCGCCGT GGCATTGCCA TTCCTCTGGG TGACTCTGGG TCCACACCTG ACACCTTTCC 10920 

2 5 CAACTTTCCC CAGCCGAAGC TGGTCTGGTA TGGGAGGCCG CCGTCCCGCG CGCGCCTCCT 10980 

GCTGGCCGCG CCCCAACACT GCCGCTCCAT TCTCTTTAGA GCGCCCGGGC CCGGGCGGCG 1104 0 

GGGTGTGCGA GCCGCGGGGC GGCGAGCCCA GCTCGGGCCC GGTGCGGCGC GAGCTCAAGC 11100 

30 

AGTTCCTCGG CTGGCTCAAG AAGCACGCAT ACTGCTCGAA CCTTAGTTTC CGCCTGTACG 1116 0 

ACCAGTGGCG TGCTTGGATG CAGAAGTCAC ACAAGACCCG AAACCAGGTA GGAAAGTTGG 1122 0 

3 5 GGGAGGCTTG CGTGGGGGGT AAAGGAGCAG AGGAAGAGAG AGACCCGGGT GAGCAGCCTC 1128 0 

CACAACACCG CACTCTTCTT TCCAAGCACA GGACGAGGGG ATCCTGCCCT CGGGCAGACG 1134 0 
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GGGTGCGGCG AGAGGTAAGG GGGTCTGGGT GAGTGGGGCC TACAGCAGTC TAGATGAGGC 114 00 

CCTTTCCCCT CCTTCGGTGT TGCTCAAAGG GATCTCTTAG TGCTCATTTC ACCCACTGCA 114 60 

AAGAGCCCCA GGTTTTACTG CATCATCAAG TTGCTGAAGG GTCCAGGCTT AATGTGGCCT 11520 

CTTTTCTGCC CTCAGGTCCT GCCGGCTAAA CTCTAAGGAT AGGCCATCCT CCTGCTGGGT 11580 

CAGACCTGGA GGCTCACCTG AATTGGAGCC CCTCTGTACC ATCTGGGCAA CAAAGAAACC 1164C 

TACCAGAGGC TGGGCACAAT GAGCTCCCAC AACCACAGCT TTGGTCCACA TGATGGTCAC 11700 

ACTTGGATAT ACCCCAGTGT GGGTAGGGTT GGGGTATTGC AGGGCCTCCC AAGAGTCTCT 117 6C 

15 TTAAATAAAT AAAGGAGTTG TTCAGGTCCC GATGGCCAGT GTGTTTGGGG CCTATGTGCT 11820 

GGGGTGGGGG GA 1183 2 



2 0 (2) INFORMATION FOR SEQ ID NO : 3 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acids 

25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: Protein 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 



3 5 Val lie Ser Pro Gin Aap Pro Thr Ueu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 

5 10 15 20 
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1. A nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding a 
novel haemopoietin receptor or derivative thereof having the 
motif : 

Trp Ser Xaa Trp Ser [SEQ ID N0:1], 
wherein Xaa is any amino acid. 

2 . A nucleic acid molecule according to claim 1 wherein Xaa 
is Asp or Glu. 

3. A nucleic acid molecule according to claim 1 or 2 wherein 
said nucleic acid molecule is capable of hybridisation under 
low stringency conditions at 42 IC to: 

5N (A/G)CTCCA{A/G)TC(A/G)CTCCA 3N (SEQ ID NO : 7 ] ; and 
5N (A/G)CTCCA{C/T)TC(A/G)CTCCA 3N [SEQ ID NO : 8 ] . 

4. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 12 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth m SEQ ID NO: 12 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

5. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 14 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 14 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

6. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as sec forth m SEQ ID 

- 132 - 



981122SA2J > 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 

NO: 16 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 16 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 42 IC, 

7. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 18 or 24 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ ID NO: 18 
or 24 or a nucleotide sequence capable of hybridising thereto 
under low stringency conditions at 421C. 

8. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 28 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 28 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

9. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ" ID 
NO: 38 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 38 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

10. A nucleic acid molecule according to claim 4 or 5 or 6 or 
7 or 8 or 9 wherein said haemopoietin receptor is of murine 
origin. 

11. A nucleic acid molecule according to claim 9 wherein said 
haemopoietin receptor is of human origin. 

12. An expression vector comprising a nucleic acid molecule 
selected from the list consisting of: 

(i) a nucleotide sequence as set forth in SEQ ID NO:12; 

(ii) a nucleotide sequence as set forth in SEQ ID NO:14; 
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(iii) a nucleotide sequence as set forth in SEQ ID NO:1d; 

(iv) a nucleotide sequence as set forth in SEQ ID NO: 18 

(v) a nucleotide sequence as set forth in SEQ ID NO: 24 

(vi) a nucleotide sequence as set forth in SEQ ID NO: 28; and 
5 (vii) a nucleotide sequence as set forth in SEQ ID NO: 38. 

13. A method for cloning a nucleotide sequence encoding a 
haemopoietin receptor having the characteristics of NR6 or a 
derivative thereof, said method comprising searching a 

10 nucleotide database for a sequence which encodes an amino acid 
sequence as set forth in one or more of SEQ ID N0:1,- SEQ ID 
NO: 7 and/or SEQ ID NO: 8, designing one or more oligonucleotide 
primers based on the nucleotide sequence located in said 
search, screening a nucleic acid library with said one or more 

15 oligonucleotides and obtaining a clone therefore which encodes 
NR6 or a part or derivative thereof. 

14 . An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
20 thereof having an amino acid sequence substantially as set 

forth in SEQ ID NO: 13 or having at least about 50% similarity 
thereto . 

15. An isolated nucleic acid molecule comprising a sequence of 
25 nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 15 or having at least about 50% similarity 
thereto . 

30 16. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 17 or having at least about 50% similarity 
thereto . 



17. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
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thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 19 or having at least about 50% similarity 
thereto . 

18. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 25 or having at least about 50% similarity 
thereto - 



19. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 29 or having at least about 50% similarity 

15 thereto. 

20. An isolated novel haemopoietin receptor comprising the 
amino acid motif: 

2 0 Trp Ser Xaa Trp Ser [SEQ ID NO:l) 

wherein Xaa is any amino acid. 

21. An isolated haemopoietin receptor according to claim 20 
25 wherein Xaa is Asp or Glu. 

22 . An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO:13 . 

30 

23 . An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 15. 

35 24. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 17 . 
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25. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 19. 

5 26. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 25. 

27. An isolated haemopoietin receptor according to claim 21 
10 comprising the amino acid sequence substantially as set forth 

in SEQ ID NO: 29. 

28. A method for modulating expression of NR6 in a mammal, 
said method comprising contacting a genetic sequence encoding 

15 said NR6 with an effective amount of a modulator of NRG 

expression for a time and under conditions sufficient to up- 
regulate or down- regulate or otherwise modulate expression of 
NRG, wherein the genetic sequence encoding said NR6 is selected 
from the nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 

20 16 or 18 or 24 or 2 8 or 38 or is a sequence having at least 

about 60% similarity to at least one of SEQ ID NO: 12 or 14 or 
16 or 18 or 24 or 2 8 or 3 8 and is capable of hybridising 
thereto under low stringency conditions at 421C. 

25 29. A method of modulating activity of NR6 in a mammal, said 
method comprising administering to said mammal, a modulating 
effective amount of a molecule for a time and under conditions 
sufficient to increase or decrease NR6 activity wherein said 
NR6 comprises an amino acid sequence: 

30 

(i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 38 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
35 forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

and which is capable of hybridising thereto under low 
stringency conditions at 421C; and 
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(ii) subscantially as set forth in SEQ ID NO:12 or 14 or 16 or 
18 or 32 or 30 or a sequence having at least 50% 
similarity thereto. 

5 30. A pharmaceutical composition comprising an NR6 receptor in 
soluble form and one or more pharmaceutical ly acceptable 
carriers and/or diluents wherein said NRG comprises the amino 
acid sequence: 

10 (i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 38 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

15 and which is capable of hybridising thereto under low 

stringency conditions at 4 2 IC; and 
(ii) substantially as set forth xn SEQ ID NO: 12 or 14 or 16 or 
18 or 32 or 30 or a sequence having at least 50% 
similarity thereto. 

20 

31. An isolated antibody or a preparation of antibodies to an 
NR6 receptor, said NR6 receptor comprising the amino acid 
sequence : 

2 5 (i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 3 8 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

3 0 and which is capable of hybridising thereto under low 

stringency conditions at 4 21C; and 
(ii) substantially as set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 24 or 28 or 38 or a sequence having at least 50% 
similarity thereto. 



35 



32- A trangenic animal comprising a mutation in at least one 
allele of the gene encoding NR6 . 
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33. A transgenic animal according to claim 33 comprising a 
mutation in two alleles of the gene encoding NR6 . 



34. A transgenic animal according to claim 33 or 34 wherein 
5 said animal is a murine animal. 
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3/43 


4/43 


5/43 


6/43 


7/43 


8/43 


9/43 


10/43 


11/43 


12/43 


13/43 


14/43 


15/43 
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17/43 


18/43 



Fig.2 
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cr 1 


cccagaactct 


a 3 8 


acrtttcaagacagtgtgtt 




aacraaaacraaataaagaga 


g 1 2 8 


cagcttggtgggtaagggg 


a 1 7 3 


agccccatccctaggaatc 


a 2 1 8 

JL. w 


cagctgctgacctccatac 


a2 6 3 


ggagacataatcaattaat 


cr 0 8 


ggcatttatgactgatgtt 


a 3 5 3 


aatatacctgtttgtattt 


a 3 9 8 


atttgaqacagggcttctc 


a4 4 3 

z3 


tcactctgtagaccaggct 


y ft o o 


ttatacttcccaacrtcictt: 


y o -J -J 


cTcaaaattcrcatacttitaa 


cr S 7 8 
y ^ / o 


actaatcrt.cTt:qaatt.ccag 


rr 9 


ctattcttaccctcccccc 


cr 8 

y o w o 


ttatatatQtacatqtgtg 


cr 7 1 3 
y / X 


acttataqaaattctctcc 


cr 7 a 


actaaaatcctcaggctta 


a 8 0 3 


catttcactggccctggat 


a8 4 8 


aaatctcttgtagctctag 


a 8 9 3 


qtcatcttgaqctgctggt 


g93 8 


aatgatactcaggcagcac 


g983 


ccttgattttgttgcctca 


gl 0 2 8 


gtttcttttctttatctgt 


gl073 


ttcctgactcttgaaacat 
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tggacgctgaggcaggaggattccca 

tctaggtaatgagaccctgtcaagaa 

caagaaaatgtttataggctgtgaga 

cact tgcc t ccaatcaagatgacc t c 

catggtagaaggagaaagcaaactcg 

atgtgctccaatgtgcacacacacag 

aggatgtatttgcttagatttgagta 

ttaaaatttttatttgattttatgaa 

ggtttggtttggtttgagttttgttt 

tgtgtagtcctggctgtccttggaac 

ggccttgaactcagaaatccgcctgc 

agattaaaggtgtgcactgccattca 

ccccagtatttgggaggcagaggcag 

gctagccaaggatacagagtgagacc 

ccaaaaccccaaaatgtattttgtgc 

ttgcagcacgtaaatgtccaaggaca 

gttcacagtctaagtcctgaattcaa 

gccacagtcttctttatgtactgagc 

tgactgatgaattaatttttgagata 

ctaggctcaaactatgaactcccaag 

actcttgcttccaccccaagtggtgg 

ttctctggggaaggggctggccttgg 

gcttcaatgagtgcttgggtctcgtt 

gaaatgggtgaacacctgttcaagac 

ccaggcagggt gagggac t tgaagt g 
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gl 1 1 8 


ggctcatcccatgcctaac 


gll63 


agctgtaatcagcccccag 


gl2 0 8 


L Q A T C S 
CCTGCAAGCTACCTGCTCT 


gl 2 5 3 


A E G L Y W 
CGCTGAGGGGCTCTACTGG 


gl2 9 8 


E L S R L L 
TGAGCTGTCCCGCCTCCTT 


gl343 


A N L N G S 
GGCTAACCTTAATGGGTCC 


gl3 8 8 
gl4 3 3 


C H A R D G 
GTGTCACGCCCGAGACGGC 
V G 

TG T T G G C T g t a a g t q cf cf cf c 


gl4 7 8 
gl523 
gl568 


ttggcaatgacagatttag 
agccatgggctctcacttg 
aggcattgcaactctaggg 


gl6 13 


gtaccccacagctttagaa 
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aaagtgtcgtctttgaccccagacac 
DPTLLIGSS 
GACCCCACCCTTCTCATCGGCTCCTC 

IHGDTP GAT 
ATACATGGAGACACACCTGGGGCCAC 



TFNGRRLPS 
ACCTTCAATGGTCGCCGCCTGCCCTC 

NTSTLALAL 
AACACCTCCACCCTGGCCCTGGCCCT 

RQQSGDNIiV 
AGGCAGCAGTCAGGAGACAATCTGGT 

SILAGSCLY 
AGCATTCTGGCTGGCTCCTGCCTCTA 

cccagacactcagagatagatggggg 

agcc tgggt ct t c tgt cc tggggcag 
catgcaggcatggtcatacccagcac 
acagc tgtggc tgcac t gt cccc tgt 

L 

aagctgtcatgt t ttcc t tgtagTGC 
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p P E K P F N 
gl658 CCCCTGAGAAGCCCTTTAA 

K D L T C R W 
g 1 7 0 3 AGGATCTCACGTGCCGCTG 

F L H T N Y S 
gl748 TCTTACATACCAACTACTC 
gl793 ccagccaagcct tgc tgtg 

gl838 t ga t c a aa t a t g t t c c t g t 

W Y G 

gl883 cc t ccacag GTGGTACGGT 

T V G P H S 
gl928 CACTGTGGGCCCTCACTCA 

F T P Y E I 
gl973 CTTCACTCCCTATGAGATC 

S A R S D V 
g 2 0 1 8 CTCAGCAAGATCTGATGTC 

g2063 tgagcccccagtgt ccacc 

g2108 cgcc t cccccccat ccccc 

g2153 t t agccacagccacggtgg 

g2198 t aatgcaaagac t t t cccc 



Fig.2(v) 
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ISCWSRNM 
CATCAGCTGCTGGTCCCGGAACATGA 

TPGAHGET 
GACACCGGGTGCACACGGGGAGACAT 

L K Y K L R 
CCTCAAGTACAAGCTGAG crt tqqt ac 
tgacttctggcaatacttaccttctc 
ttatgaactcaaaagggactctcgca 

QDNTCEEYH 
CAGGATAACACATGTGAGGAGTACCA 

CHIPKDLAL 
TGCCATATCCCCAAGGACCTGGCCCT 

WVEATNRLG 
TGGGTGGAAGCCACCAATCGCCTAGG 

LTLDVLDV 
CTCACACTGGATGTCCTGGACGTGG q 

tgtgttctgccctagaccttataggg 
cagactttttggttcttctagaggtc 
ttgcaggacagtggttgttcataact 
caagacagtcaagatttttcccctcc 
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g2243 ccacccccaacacacacat 

g2288 ggcctgaccaccctccctc 

g2 3 3 3 gt cctaggggactgagagg 

g2 3 78 ggaagccgaggccttgagc 

g2423 acgaactggatgat ccctg 

g2468 gg t g t t c c c age c c aaagc 

g2513 gc c t cac t gaagac t cagg 

g2558 tggtcccccaggagggtt c 

g2603 tccagaggttttgtgtct t 

g2648 ctgtggc tggcacagctgc 

g2693 aggcat cagaggtggacat 

g2738 caaat agcacct caaggtg 

g2783 cctgacgct cagaaagcc t 

g2828 tcact ctgggacatgt agt 

g2873 tagctttaagagtcagctt 
g2918 taat aggt gc tgggt gat g 

g2963 tct ctgcgct aat c t ccac 

g3 0 0 8 ct tgagggcaggaat gtgt 

g3053 gt agcagcaactgc tgctg 

g3098 taat eta t caggcc tgggt 

g3143 gt ctggaaaacgcagat ag 

g3188 ttacaccactgggtgt tc t 

g3233 tec t cagaactgggagcac 

g3278 taatgccagcat t agggga 

g3323 t t caaggceat ce t gaat t 

g3368 gg t gegc agt aaaa c c t t g 



Fig.2(vii) 
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acacacacactctgcagagaacacct 
tctacagcccaggtgttcagaaggga 
aggcgcccaggtctgaaggcgcccca 

tgggggggggggcgagggttggaggc 

agcacaactgggcctaatctaattag 

agcctgggccatttaacccttcaagt 

ggagagatcagcttgtactctctcca 

ctgggtgcccctggctcattcccaca 

cctggcatctaaccctcagttgtgct 

cc cgt ggaggc t c 1 1 ggt aat gt aca 

gggatggggatacatagggatggagc 

gggtgatatacaataaagcttgtcac 

actcatgatgatcacaattgttgaca 

gagaccctagctcaaaacacagacag 

gtgacttaatactggaactcagggcc 

c t cgcctcactccctgtttagtgaga 

cccagctgggtgggctgctctgtccc 

gtcttccatcagagataggacccgtg 

gctgtttctggaatattaaatgacag 

gagt age t aacagggg t gggggcg t g 

ggtcataggagccactgcagcctaga 

gtcactaggccattctcaccaagcag 

tgttgccagcatttaatgccagcatt 

ggcagaggcagaaggatctctctgag 

t acat aaagagc t ccaggccagccag 

tctcaaaaaacaaagcatctttagtg 
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g34l3 a c c aggc t t g c t c c a c c c c 

V H V S R V G 
g3458 GTGCACGTGAGCCGCGTTG 

R W V S P P 
g3 5 0 3 CG C TGGGT CT C AC C AC C AG 

K Y Q I R Y 
g3548 AAGTACCAGATCCGCTACC 
g3593 gtgcccgtcccgccccgga 

g3638 ctgactcct ccct caccgt 

Q T S C R L A 
g3683 AGACCTCCTGCCGTCTCGC 

F V Q V R C N 
g3728 TCGTCCAAGTGCGTTGTAA 

K A G I W S E 
g3 7 7 3 AGG CGGG AATCTGG AG CG A 

T P R S 
g3818 CCCCTCGAAGTG qtqaqca 
g3863 aa t c cccaa t ccat c c t gt 

Fig.2(ix) 
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VTTDPPPD 
cagTGACCACGGACCCCCCACCCGAC 



GLEDQLSV 
GGGGCCTGGAGGACCAGCTGAGTGTg 

ALKDFLFQA 
CTCTCAAGGATTTCCTCTTCCAAGCC 

RVEDSVDWK 

GCGTGGAGGACAGCGTGGACTGGAAG 

cccgcccctgaccccgccccccgcat 

V V D D V S N 
gcaa GTGGTGGATGACGTCAGCAACC 

GLKPG TVY 
GGGCCTGAAGCCCGGCACCGTTTACT 

PFGIYGSK 
CCC AT TCGGGATCTATGGGTCGAAAA 

WSHPTAAS 
GTGGAGCCACCCCACCGCTGCCTCCA 



cc t c t ccagggc tggc tggcccatgg 
tccttcccccccaccctttttttgag 



Fig.2{x) 
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g3 9 08 acagcgtcttcaggtagcg 

g3 953 gtcaaggatgacctcgagc 

g3998 gacaatggccagtggccat 

g4043 agt ctat tt agcctgt cat 

g4088 tgacctcttgtaagagaac 

g4133 t at cct aggc t ct ct agag 

g4178 ttacagccagt tat cacat 

g42 2 3 acctatagaccacagtgcc 

g4268 tgc tggcccacccc t ccaa 

g4 3 13 taatatttgcaatcctcct 

g4358 ccaggcat taacccaagt t 

g44 0 3 gtgggagggcctaaagatg 

g4448 agcccatggat ctgcac t c 

g4493 tgt ctggcct cagt t t ccc 

g4538 cggtccaagacact t cat t 

g4583 cccat cccccacccgc t tc 

g4628 t acactgaaac tgaac t ct 

g4673 atgatgaaat aatggggaa 

g4718 gaagagggt caaaaccagc 

g4763 gggcctc t ccaggt t ctgg 

g4808 aggggctggagcc tgggag 

g4 8 53 ctgcgattcttgcacggga 

g4898 gagac tgaagaagccgggg 

g4943 gc tgtgggggccgaagc t t 

g4 988 agttttatttatggcgtga 

g5033 ctgggggatggctgcggct 

Fi9.2(xi) 
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catgctggccttaaattcagtatgta 

tcctggtctttttgtctccacttaga 

caccacctttgggagactagccatgg 

ttggtgacagatggagtacaacagtg 

tgaagacaggctgtttttaaccccaa 

gttaactttatataaaatagagacta 

ggtcccacagaaccttttgtcacaca 

tgtgcctaccacataagggtctctac 

cccttaaaaggtaacctaggcagcct 

acctcagcctcttgaatgctcagaaa 

tctcttctctgggtccctttcttaag 

acttcctttgtcctgaagactctccg 

tctaatatgaaatatattgcataaaa 

cacctgtcaggtttaggcagcacagt 

atttgcaggcagtataagaagaagct 

ctccggtccctaagacagaatacttc 

cgcagacgcatatgctcactttaatg 

actgaggctccgagagattcctggag 

tccaggaagctctccagcccccatcc 

gcttggcgggagtgaacacagctggg 

ctttggcccttgctcgtgcccagcac 

gccagcaggcggctgcgtccgcccga 

gtagggttggagggaggtaagcaggg 

gtgccagggcctgtcagcgagtcccc 

ggccgatgtccttatccgctggcctg 

ggggattggacccaagggctggcttc 



Fig.2(xii) 
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g5078 ccact cagt cctccagccc 

g5123 tgaggct tat ct tgggaac 

g5l6 8 ctatttctgtcattcactt 

g5213 aatataact acgt t ttaaa 

g5258 t t cgtgagcgtgcgtgcca 

g53 03 tttgttgagtaggctcctt 

g5348 caagagcaat t actgagt c 

g5393 tcccat cctgt ttggat ag 

g5438 ggctt t aat t t cgt agcta 

g5483 gc t accacgt t tgt gggag 

g5528 gacacagtcccaagat etc 

g5573 gccccttgctttgtccgtgt 

g56 18 cattgactggtctttcctt 

g5663 ctgatttgactccc tcct t 

g5708 ccat t cc tc tgggtgac t c 

g57 5 3 actttccccagccgaagct 

g5798 gcgcgcgcc t cc tgctggc 

E R P G 
g5843 tcttta qAGCGCCCGGGCC 

G G E P S S 
g5888 GGCGGCGAGCCCAGCTCGG 

F L G W L K 
g5933 TTCCTCGGCTGGCTCAAGA 

F R L Y D Q 
g5978 TTCCGCCTGTACGACCAGT 



Fig.2(xiii) 
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actccatgtcacacccgtgcattctc 

ccgcccttgttctgtgctgtctgtct 

tcccagagccttttttttatgctttt 

aattgcttttgtataatgtgtgtgcc 

caacacacacgtgaaggttagagaac 

ccaccatgtgggactagggctggcga 

atctcgccagcccctcacccctcact 

tcataggtaatcgaaggtaaatcgct 

tcctgcctcagcctaccaagtgctgt 

gggctctcctcccagtgtctgggggt 

tgctttctaggtctttgtcttagttt 

ccctagagtctccggccccacttatc 

taccgaatactcggttttacctccca 

tgcttgtctccatcgccgtggcattg 

tgggtccacacctgacacctttccca 

ggt ctggtatgggaggccgccgt ccc 

cgcgccccaacactgccgctccattc 

PGGGVCEPR 
CGGGCGGCGGGGTGTGCGAGCCGCGG 

GPVRRELKQ 
GCCCGGTGCGGCGCGAGCTCAAGCAG 

K H A Y C S N L S~ 
AGCACGCATACTGCTCGAACCTTAGT 

WRAWMQKSH 
GGCGTGCTTGGATGCAGAAGTCACAC 



Fig.2(xiv) 
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K T R N Q V 
g6023 AAGACCCGAAACCAGGTAG 

G K G A E E 
g6068 GGTAAAGGAGCAGAGGAAG 

Q H R T L L 

g6113 CAACACCGCACTCTTCTTT 

P R A D G V 
p S G R R G A 
g6 158 CCTCGGGCAGACGGGGTGC 
g6203 GTGGGGCCTACAGCAGTCT 
g624 8 TGTTGCTCAAAGGGATCTC 
g62 9 3 GAGCCCCAGGTTTTACTGC 



g63 3 8 CTTAATGTGGCCTCTTTTC 

g6 3 8 3 CTAAGGATAGGCCATCCTC 

g64 2 8 CTGAATTGGAGCCCCTCTG 

g64 7 3 CCAGAGGCTGGGCACAATG 

g6 5 18 ACATGATGGTCACACTTGG 

g656 3 GGTATTGCAGGGCCTCCCA 

g6 6 0 8 TTGTTCAGGTcccgatggc 

g6653 ggtgggggga 

Fig.2(xv) 
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GKLGEACVG 
GAAAGTTGGGGGAGGCTTGCGTQGGG 

ERDPGEQPP 
AGAGAGACCCGGGTGAGCAGCCTCCA 

SKHRTRGSC 

D E G I L 
CCAAGCACAGGACGAGGGGATCCTGC 

RREVRGSG* 
A R 

GGCG AGAGGTAAGGGGGTCTGGG TGA 
AGATGAGGCCCTTTCCCCTCCTTCGG 
TTAGTGCTCATTTCACCCACTGCAAA 
ATCATCAAGTTGCTGAAGGGTCCAGG 

V L P A K L 
G P A G * 
TGrrCTCAGGTCCTGCCGGCTAAACT 



CTGCTGGGTCAGACGTGGAGGCTCAC 

TACCATCTGGGCAACAAAGAAACCTA 
AGCTCCCACAACCACAGCTTTGGTCC 
ATATACCCCAGTGTGGGTAGGGTTGG 
AGAGTCTCTTTAAATAAATAAAGGAG 
cagtgtgtttggggcctatgtgctgg 

Fig.2(xvi) 
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25/43 


26/43 


27/43 


28/43 


2943 


30/43 


31/43 


32/43 


33/43 


34/43 


35/43 


36/43 


37/43 


38/43 


39/43 


40/43 


41/43 



Fig.3 
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GCGGCCGCTG 


CAGTGATTAC 


TCACCGCGTG 


TTTTTCCGTG 


GGGGGATGTG 


AAGAAGTTTA 


GGAATGCAGG 


GTTCGGTCCC 


GTTCCCCAAA 


AAGGGCTCCC 


TGCACGCGCT 


CCGGGACATC 


TGAGAAGGGA 


CCAGAGGCCG 


GAGACTCCCT 


ACGAAACGAG 


ACTACAGCGA 


TGGGAGAGGT 


GACCCATGCA 


CCCAGAGAAA 


GGGACTGGTG 


AGGGCTGAAA 


GAGGATGAAC 


GGGCTCAGGT 


TGGGTATGGG 


GGCCCCGTAA 


GAGGGGCGGG 


GGAGGGGATC 


CTGGAAAAGC 


ACCAGGGCTG 


ACAGGATCCC 


AGATGAGGGG 


GTGGGAAGCC 


CACGGGCTGG 


TGGGGAAAGA 


GTGGGGGGCT 


GTAACTGGGC 


GGAGGCCGGC 


CGGGCGGGGC 


GTGCGGGGCC 


CACGATCAAC 


CCCCCCCCAG 


CGGGGCGAGC 


GGCGCATTAG 


CGCCTTGTCA 


CGCTGTCCGC 


GCCCAGTGAC 


GCGCGTGAGG 


CGCCCCCGCC 


CCATACCGGC 


GTTGCAGTCA 


GGGTCGCCCG 


GGCCCCGTCG 


CCCAATCCGC 



Fig.3(i) 
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GCGCACCCCA 


CCCGCGGGCC 


GCTGAGTGGA 


60 


GGGAGAACTC 


TTCTGCACCG 


ATGGGAACTA 


120 


GGACACACCT 


CTCCCCATAA 


GCCCACTCAT 


180 


CCCATATCCA 


ATACCCGCAG 


ATATGATAGT 


240 


CCCTGCCTTC 


TGGCTTTCCC 


CCCCCCCTGC 


300 


GGCATGAAGG 


CTTAGGGTGG 


GGATCGGTAG 


360 


GCAACTTTCA 


AACTCTCTGG 


GGAAGGAAGA 


420 


ACTGCTCAAT 


GTGTGTGTGG 


CGGACCAAAG 


480 


GAAGGTGGAT 


AGGAAGGATC 


CCGGTAGACT 


540 


CGAGCTAGGA 


ACCCATTCGG 


AGTTAAGGGT 


600 


TGGGACGGGC 


GGGACCAGAG 


AGGGAGGTCC 


660 


TCGCGCAGGA 


GGATGGGACG 


TTCAGGAGTG 


720 


GCGCGGTGCC 


CGCGGGCGGT 


GGGAAGGCCG 


780 


vjvj vj V- vjVjvj 


GGGC CGGGGG 


CGGGGCCGGG 


840 


ATTTCGGCTG 


CTCAGACTTG 


CTCCGGCCTT 


900 


ACCCGAGCCC 


CAATCTGCAC 


CCCGCAGACT 


960 


CCGCCCGTTG 


CGCGCCACCC 


CCATGCCCGC 


1020 


GCGGCGGCCG 


CCGCGGCCGC 


TGTCCTCGCT 


1080 



Fig.3(ii) 
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GTGGTCGCCT 


CTGTTGL- id 


rp ^ ix» 'T' 


GTACCGTGCG 






AGTCGCGGGG 






GGCGGCCCTC 


GGGGCGCLL. i 


CAUC^ i G i GvjtLj 


AGTACCCCGT 


TATAC A 1 \w AG 


A OOrf^/^TT^'PT^ A 


AGGCTCAGTT 


m/^ TV TV /"^ TV TV rp 

TGAAGGAC A i 


CGGAvjf i Va i 


GCTTCGGGGC 


GCACGCCTGT 


GTC X IGGAIA 


GGGCGCACGC 


TTGGGTGCGT 


rn rn rn >Ti 

TGGGTTGGGT 


GAAGTGATGA 


TCCCCGGGGG 


GAGGGTGGGG 


ATGCGGCCCG 


GCGTCCCTCG 


GGACTTGCCT 


CTATAGCAGA 


CTCCATGCTT 


TGGT AT C CT C 


CGGTCTCATT 


CAGGCTGCGC 


TGGG i TGAGA 


CGAGAGCAAG 


CGTGTCCGGG 


CAv_ CGCGACjL. 


GGGGGTCAGC 


TGCCGAGAGA 


ATCCCACTGT 


ATCACCCAAC 


GCACACATCC 


CCGCCAGGAT 


CACACCCAAA 


GACACACAAA 


AGAGCCCCAC 


CGCGCGCTGC 


AGCCCAGATG 


CGTATTCGCA 


ACACACACAC 


ACACACACAC 


ACACACACAC 



Fig.3(iii) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO ^98n 225A2J_> 



wo 98/11225 



PCT/GB97/02479 



23/43 


GGTGCCTCGG 


GGCGGATCGG 


GAGCCCGTGA 


1140 


GGGAAGCCGG 


GATCCGGCGC 


CCCGGGGGGT 


1200 


CGCCACCTGG 


ACGTCCCGGG 


AACAAAGGAA 


1260 


GCTCATGGCA 


CCACCACCCA 


GCCTCCCAAG 


1320 


TCTGTATCCC 


CTTTGCGAGG 


CTGTCTGGCC 


1380 


TGGGACCCCC 


CTCCTTCAGG 


GTGCTGGGAC 


1440 


TCAGAGCGGA 


AGGGAAGCCT 


CCCTGGCCGG 


1500 


GCTGGCGCAA 


AGTGGGGTCC 


CCTCCCCCAT 


1560 


CGTTATCGTG 


AGCCCTCCTG 


TCCGCCTGGC 


1620 


CTCCGTGGGG 


TCGGCGCCGC 


CCCCTCCCCC 


1680 


GAAGTCCTCT 


CCACTGGTGG 


GGCTCACAAC 


1740 


GCCTCTAGCG 


ACTGAAATTT 


CGGTGAGGAG 


1800 


CCAGACTTCA 


TTGTCTAAGG 


GGCACCCAGT 


1860 


CCCAGGAGGA 


ACTCCTGGCC 


TTGAGCCCCC 


1920 


GCGGTCTCCA 


CATCCAGACC 


CTCTCTGGGA 


1980 


TGGCTTATGT 


CCCGTCACCC 


TGCCCTCCGA 


2040 


CACCATCGCG 


GCGCTCGGAT 


TCCATCCTCT 


2100 


ACACACACAC 


ACACACAGAC 


ACGCACACAC 


2160 



Fig.3(iv) 

SUBSTITUTE SHEET (RULE 26) 

BNSOOCID; <WO ^981 1225A2_I_> 



wo 98/11225 



PCT/GB97/02479 



24/43 



ACACGCACGC ACACACACGC ACGCCCGCAC 
GCAACACCGG GGTACGCATA TGGTTGAGTG 
ACCCCATCCG GAGACACAGG CCACACCGCA 
TAGTAGTCTT GTGCAGTTTG TCCGCGGTGT 
ACAGGAACCT ACACTCCTGC TTGCCCAAGG 
GACCTTTCCG GGGAGTTGGT GTTGCTGCCA 
GCGCTAAGCT TTGTTTCCGG GCGGGCTGCA 
TGGCGCGTGT GTTTTTTCTT TTAAGGGGGA 
TGCAATCTGT TTGTACTTAC CGTGTGTCTT 
AAAGTGTATG CAGGTACCAG CGGGACAGGA 
GAGGCCACCT TCCCGTTGGC CTTTCAGGGA 
GTGTTCTTTT TAATAACGGC AGCAACTCCG 
GGCCCCGGCT TTGTGGAAAG GAGGGGAAGA 
GGCTTAGGGG GCTGTCAGCT GCTGCTCTGT 
AGTGGCTTTG GCCCATTGTT TGTGGAAGCC 
TACTCCAGAG TCAGGCTTCT CAGTCCGAGC 
GAATCAGGGA AGGGGGTGCC AGGTGGACTA 
AAGGAGAAAG CTTGGGCTTG CCCCCCTCCC 



Fig.3(v) 

SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <VVO_9811225A2J_> 



wo 98/11225 

25/43 



PCT/GB97/02479 



TPrJTGGTCCC 


ACATTTATTT 


CACAGGGGAG 


2220 




CTTTCCCCAC 


CACTCTCAGG 


2280 


naaacACCAC 


GCTGCGCTGC 


TGCTCTGGGC 


2340 




CCTCCCGCTC 


TTGTCAGGGG 


2400 




GGTGATGTGG 


TGACACCCGG 


2460 




GTTTTTGAAT 

\j X X X X X vj.n-t^ X 


GCCACCAATA 


2520 




CGAAGGTGGC 


GGAGTGGGGG 


2580 




ATA AriAGGTT 


CTCACACCTC 


2640 






GTGGGTCGTA 


2700 




X wV-JVJVJ X JTk. X 


GGCTGGGATG 


2760 




XXXV^v^V^XXXX 


AAAACACATG 


2820 




rxaaaa a a a t A 


AGCTTGTATA 


2880 


nnni^ AnA A A A 


A AGG AGGGGT 


GTCTCCTCCA 


2940 


CTAGCTTGGC 


ATGTGTGTGC 


CCCAGTCCCC 


3 000 


AAGAGGGAGA 


CTGGAGTCCT 


CTATCTCTGG 


3060 


CCAGAGAACG 


TCTTCCCTGT 


TTTATGGAGG 


3120 


CGTTCTGCTG 


AGGACTGTAC 


CAGTCGCTCG 


3180 


CCCTCAAGCC 


ACGAAGGGCA 


GCTGCTAGGC 


3240 



Fig.3(vi) 

SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 981122SA2_I_> 



wo 98/il225 



» 

PCT/GB97/02479 



26/43 


TAGTGTGGTA 


AAAGGGCATT 


ACTCCCCAGC 


CAGACAAATG 


CTGGGGAGGG 


ACAGAGGGGT 


GGTCCCGGGT 


CGGGCAGTGC 


CTCCCACCCT 


GGGTGGGCCG 


GGGTAGAGAC 


GCTGGCACGT 


GCGGGCGGCT 


GGCTGCCTGG 


GACCTCCGGG 


GCCTGCTCCT 


CCTGCTCCTT 


CGCACGGACG 


CCCAAATGCA 


ACTGCGATTG 


CAGGCTTCGC 


CCTGGGAGAA 


GTCATTCAGG 


GCCCAGACTA 


GGGCATGAAG 


GACCGTCCAG 


GGCTGCAGTT 


GCAGCCTCTG 


TTCTCCGAGC 


GTCTTTGGAA 


AATACTCTTT 


TCCTCTCATC 


CCATCCCGGG 


TGCAGTCTTC 


CCTAACCTTT 


TCTTTGCTTC 


CCTCTCCCCT 


TGCCCAACTG 


GGGCTCCAGC 


CAGGGCCTCT 


CTGACACACA 


GGGTTGTAGC 


CTCTTTTGCT 


TCTGAGACTT 


AATTTTTTTC 


TCTCTGTACA 


GCCCTGGCTG 


CCCTGGCACT 


ACAAACCTAC 


CTGCCTCTGC 


CTTTCCAGTG 


AGTAGTTAAG 


TGTTTTGCTG 


TGTCTTTATT 



Fig.3(vii) 

SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



27/43 



C AGCa AC U C U C 




PPTTCCTGGC 


33 00 


/-irpr*^ TV n-ii^ TV rpi-p/-^ 
(jj 1 VjA 1 A 1 1 VJJ 




CAGACAGTGG 


3360 




V- V-rt 


AGGAAGCGGT 


3420 


r^r^r^ tv /^T'*T'/^ A 
CCCALjI iL-Ai 


nppp A Ann A a 


TTPTGAATTA 


3480 




Vjt VJJ V- V- v.^ ^ ^ 


nPTPPHTPTG 


3540 


CTGAGAv-t^ i U 






3600 


TV TV TV i^r^r^r^ 


T'/^r^T^ppp A An 


nPPAAATTTG 


3660 


GAAC CATGTT 


GG 1 GL-CACU i 


P ATPP AT*PTn 


3 72 0 


TAGCTTCTTA 


A i AGGAAv-L- i 




3780 


ATCGGTTTTG 


rp nri *T* nm^ HP 

iiiliLjilll 


TPTTTTTTPP 

X W XXXXXXV^Vw 


3840 


ACTGTTTTCC 


T C C G T AAGGG 




3 900 

*J -7 w V 


TAG C C C AGGG 


GGl i IGGAGA 


nrnnAPTPPPA 


3 960 


CTTAC J. GL. A i 


T'TT* n p T* p nrT*n 


PTAAPTGTCC 


4020 


CCCAGCTCCC 


TCTCTTCTCC 


TCCCCCCTTT 


4080 


TTTTTCTTTT 


TGGCTTTTTG 


AGACAGGGTT 


4140 


CATTCTGTAG 


ACCAGGCTAG 


CCTCAAACTC 


4200 


CTGGCACTAA 


AGATGTGGGC 


CACCACAACT 


4260 


CCTATAGTGA 


CCTCAGTTCC 


TGGCATATTG 


4320 



Fig.3(viii) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO_9811225A2J_> 



28/43 



TAGGCGATGG 

CTTGAATCGT 

GGCAGCCTGG 

CACCCTGCCA 

AGGGAAGCTT 

CCAGCCTATG 

GTCCCTCAGG 

AGGAAATGAT 

GCTTCTGTGG 

TGTGAGGAGG 

CCAACAGGGC 

TTTGATTCCC 

TTTTAGATAT 

ACCACCAGGA 

TGGCTTATGT 

CAGTGTGTTC 

ATGTTTATAG 

CCTCAGCCCC 



ATGGATGAAT 

CCTGAGTGAA 

CCTGCTGGTC 

TCCTGTGTGG 

GGAATATGTT 

AGTAGGGCAG 

GTGGGTCACA 

TGTGGAGAGT 

CTGTCCCTTC 

GCACGGGGAA 

TCACCTCTCC 

TTCCTTTGGT 

GTCCATTCTC 

CAGACAAAGA 

GTAATCCCAG 

TAGGTAATGA 

GCTGTGAGAC 

ATCCCTAGGA 



GGATGGATGG 

AAAAGAGACC 

TCATGGGAGC 

CTGACAAGAA 

CCCCTCCTCA 

CTGTGGGCTG 

GGATTGAGGT 

CAGAACTCCT 

TCTTGTGGTC 

AATGAAGGCT 

TCTGGACAGG 

CTCCTGGGAT 

CAGAAACACA 

ATTGGAGAGG 

AACTCTGGAC 

GACCCTGTCA 

AGCTTGGTGG 

ATCCATGGTA 



Fig.3(ix) 

SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



29/43 


ATGGATGGAT 


GGATGGTTGG 


ATGGAGCAAG 


4380 


TCAGAGAACT 


GAATGGAGTT 


AGGTTCCCAG 


4440 


TCCCTGTGAA 


ACTTCCCCCA 


CACCTCCCAC 


4500 


AGGCCAATGG 


CCAGATGGGG 


ACACAGACTC 


4560 


TATCCTAGGC 


CTTGTTGTCC 


CCCTGAGGGC 


4620 


CCCTAAGGTT 


GGGTAGGCAA 


GAAGGGGGTG 


4680 


CATTTCCAAA 


GTGGCCATCA 


CAGTGGCCCT 


4740 


GTTGGGAGTT 


GTAGAGGGCC 


TTGCATGTGG 


4800 


CTTTGCACAG 


TCCCCTCGTG 


TGTGCTGGGA 


4860 


CAGCCCCTGA 


GCTTGCCCTT 


CACGGTTCAC 


4920 


CTCTCACTGT 


ATGCACAGAT 


TGGCCTCACA 


4980 


GACAAACATT 


TACCAGGGTA 


GGATTTTACA 


5040 


CTTGTGAGGT 


TAGGGTATCA 


GTGAAAGGAC 


5100 


AAGGAAATTG 


GTAAGCCAGG 


CCATGCTTGA 


5160 


GCTGAGGCAG 


GAGGATTCCA 


AGTTTCAAGA 


5220 


AGAAAAGAAA 


AGAAATAAAG 


AGACAAGAAA 


5280 


GTAAGGGGCA 


CTTGCCTCCA 


ATCAAGATGA 


5340 


GAAGGAGAAA 


GCAAACTCCA 


GCTGCTGACC 


5400 



Fig.3(x) 

SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <WO ^981 1225A2_L> 



wo 98/11225 



PCT/GB97/02479 



30/43 


TCCATACATG 


TGCTCCAATG 


TGCACACACA 


TTTGCTTAGA 


TTTGAGTAGG 


CATTTATGAC 


GAAAATATAC 


CTGTTTGTAT 


TTGGTTTGGT 


GCTTCTCTGT 


GTAGTCCTGG 


CTGTCCTTGG 


ACTCAGAAAT 


CCGCCTGCTT 


GTGCTTCCCA 


TCAGCAAAAT 


TGCATACTTT 


AACCCCAGTA 


ATTCCAGGCT 


AGCCAAGGAT 


ACAGAGTGAG 


CCAAAATGTA 


TTTTGTGCTT 


GTGTATGTAC 


ACAACTTGTA 


GAAGTTCTCT 


CCGTTCACAG 


AGGCTTAGCC 


ACAGTCTTCT 


TTATGTACTG 


GAATTAATTT 


TTGAGATAAG 


GTCTCTTGTA 


AAGGTCATCT 


TGAGCTGCTG 


GTACTCTTGC 


GCAGCACTTC 


TCTGGGGAAG 


GGGCTGGCCT 






ri iri I ri^ rri rri rp /^rri pp 
X X ^ X X X X V— X X 


GACTTCCTGA 


CTCTTGAAAC 


ATCCAGGCAG 


GCCTAACAAA 


GTGTCGTCTT 


TGACCCCAGA 


CCTTCTCATC 


GGCTCCTCCC 


TGCAAGCTAC 


CACCGCTGAG 


GGGCTCTACT 


GGACCTTCAA 



Fig.3(xi) 

SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



31/43 



CAGGGAGACA 


TAATCAATTA 


ATAGGATGTA 


5 


4 6 0 


TGATGTTTTA 


AAATTTTTAT 


TTG AT T T TAT 


b 


c o r\ 
b z U 


TTGGTTTGAG 


TTTTGTTTAT 


TTGAGACAGG 


5 


c Q r\ 

580 


AACTCACTCT 


GTAGACCAGG 


CTGGCCTTGA 


1 — 
5 


64 0 


AGTGCTTAGA 


TTAAAGGTGT 


GCACTGCCAT 


5 


7 00 


TTTGGGAGGC 


AGAGGCAGAC 


m TV TV rri/^rTi/^ TV 

TAATGTGTGA 


5 


^ f f\ 
750 


ACCCTATTCT 


TACCCTCCCC 


CCCCAAAACC 


5 


f\ 

820 


ATGTGTGTTG 


CAGCACGTAA 


ATGTCCAAGG 


5 


88 0 


TCTAAGTCCT 


GAATTCAAAC 


TAAGGTCCTC 


5 


94 0 


AGCCATTTCA 


CTGGCCCTGG 


ATTGACTGAT 


6 


000 


GCTCTAGCTA 


GGCTCAAACT 


ATGAACTCCC 


6 


06 0 


TTCCACCCCA 


AGTGGTGGAA 


TGATACTCAG 


6 


12 0 


TGGCCTTGAT 


TTTGTTGCCT 


C AGCTT C AAT 


6 


180 


TATCTGTGAA 


ATGGGTGAAC 


ACCTGTTCAA 


6 


240 


GGTGAGGGAC 


TTGAAGTGGG 


CTCATCCCAT 


6 


300 


CACAGCTGTA 


ATCAGCCCCC 


AGGACCCCAC 


6 


360 


CTGCTCTATA 


CATGGAGACA 


CACCTGGGGC 


6 


420 


TGGTCGCCGC 


CTGCCCTCTG 


AGCTGTCCCG 


6 


480 



Fig.3(xii) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 981 1225A2J_> 



wo 98/1 1225 



PCT/GB97/02479 



32/43 


CCTCCTTAAC 


ACCTCCACCC 


TGGCCCTGGC 


GTCAGGAGAC 


AATCTGGTGT 


GTCACGCCCG 


CTATGTTGGC 


TGTAAGTGGG 


GCCCCAGACA 


GATTTAGAGC 


CTGGGTCTTC 


TGTCCTGGGG 


CATGGTCATA 


CCCAGCACAG 


GCATTGCAAC 


TGTGTACCGC 


ACAGCTTTAG 


AAAAGCTGTC 


CCTTTAACAT 


CAGCTGCTGG 


TCCCGGAACA 


GTGCACACGG 


GGAGACATTC 


TTACATACCA 


TACCCAGCCA 


AGCCTTGCTG 


TGTGACTTCT 


TTCCTGTTTA 


TGAACTCAAA 


AGGGACTCTC 


CACATGTGAG 


GAGTACCACA 


CTGTGGGCCC 


CCTCTTCACT 


CCCTATGAGA 


TCTGGGTGGA 


TGATGTCCTC 


ACACTGGATG 


TCCTGGACGT 


GCCCTAGACC 


TTATAGGGCG 


CCTCCCCCCC 


GTCTTAGCCA 


CAGCCACGGT 


GGTTGCAGGA 


TTTCCCCCAA 


GACAGTCAAG 


ATTTTCCCCT 


CTCTGCAGAG 


AACACCTGGC 


CTGACCACCC 


GAGTCCTAGG 


GGACTGAGAG 


GAGGCGCCCA 



Fig.3(xiii) 

SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 981 122SA2J_> 



• 

wo 98/11225 



33/43 



PCT/GB97/02479 



CCTGGCTAAC 


CTTAATGGGT 


CCAGGCAGCA 


6540 


AGACGGCAGC 


ATTCTGGCTG 


GCTCCTGCCT 


6600 


CTCAGAGATA 


GATGGGGGTT 


GGCAATGACA 


6660 


CAGAGCCATG 


GGCTCTCACT 


TGCATGCAGG 


672 0 


TCTAGGGACA 


GCTGTGGCTG 


CACTGTCCCC 


6780 


ATGTTTTCCT 


TGTAGTGCCC 


CCTGAGAAGC 


6840 


TGAAGGATCT 


CACGTGCCGC 


TGGACACCGG 


6900 


ACTACTCCCT 


CAAGTACAAG 


CTGAGGTTGG 


6960 


GGCAATACTT 


ACCTTCTCTG 


ATCAAATATG 


7020 


GCACCTCCAC 


AGGTGGTACG 


GTCAGGATAA 


7080 


TCACTCATGC 


CATATCCCCA 


AGGACCTGGC 


7140 


AGCCACCAAT 


CGCCTAGGCT 


CAGCAAGATC 


7200 


GGGTGAGCCC 


CCAGTGTCCA 


CCTGTGTTCT 


7260 


ATCCCCCCAG 


ACTTTTTGGT 


TCTTCTAGAG 


7320 


CAGTGGTTGT 


TCATAACTTA 


ATGCAAAGAC 


7380 


CCCCACCCCC 


AACACACACA 


TACACACACA 


7440 


TCCCTCTCTA 


CAGCCCAGGT 


GTTCAGAAGG 


7500 


GGTCTGAAGG 


CGCCCCAGGA 


AGCCGAGGCC 


7560 



Fig.3(xiv) 

SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 981 1225A2J_> 



34/43 


TTGAGCTGGG 


GGGGGGGGCG 


AGGGTTGGAG 


GGGCCTAATC 


TAATTAGGGT 


GTTCCCAGCC 


GTGCCTCACT 


GAAGACTCAG 


GGGAGAGATC 


GGGTTCCTGG 


GTGCCCCTGG 


CTCATTCCCA 


TAACCCTCAG 


TTGTGCTCTG 


TGGCTGGCAC 


CAAGGCATCA 


GAGGTGGACA 


TGGGATGGGG 


AAGGTGGGGT 


GATATACAAT 


AAAGCTTGTC 


GATCACAATT 


GTTGACATCA 


CTCTGGGACA 


AGTAGCTTTA 


AGAGTCAGCT 


TGTGACTTAA 


GTGATGCTCG 


CCTCACTCCC 


TGTTTAGTGA 


GTGGGCTGCT 


CTGTCCCCTT 


GAGGGCAGGA 


TGGTAGCAGC 


AACTGCTGCT 


GGCTGTTTCT 


CTGGGTGAGT 


AGCTAACAGG 


GGTGGGGGCG 


AGCCACTGCA 


GCCTAGATTA 


CACCACTGGG 


AGTCCTCAGA 


ACTGGGAGCA 


CTGTTGCCAG 


AGGGGAGGCA 


GAGGCAGAAG 


GATCTCTCTG 


AGCTCCAGGC 


CAGCCAGGGT 


GCGCAGTAAA 


TGACCAGGCT 


TGCTCCACCC 


CCAGTGACCA 



Fig.3(xv) 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 

35/43 



GCACGAACTG 


GATGATCCCT 


GAGCACAACT 


7620 


CAAAGCAGCC 


TGGGCCATTT 


AACCCTTCAA 


7680 


AGCTTGTACT 


CTCTCCATGG 


TCCCCCAGGA 


7740 


CATCCAGAGG 


TTTTGTGTCT 


TCCTGGCATC 


7800 


AGCTGCCCCG 


TGGAGGCTCT 


TGGTAATGTA 


7860 


ATACATAGGG 


ATGGAGCCAA 


ATAGCACCTC 


7920 


ACCCTGACGC 


TCAGAAAGCC 


TACTCATGAT 


7980 


TGTAGTGAGA 


CCCTAGCTCA 


AAACACAGAC 


8040 


TACTGGAACT 


CAGGGCCTAA 


TAGGTGCTGG 


8100 


GATCTCTGCG 


CTAATCTCCA 


CCCCAGCTGG 


8160 


ATGTGTGTCT 


TCCATCAGAG 


ATAGGACCCG 


8220 


GGAATATTAA 


ATGACAGTAA 


TCTATCAGGC 


82 8 0 


TGGTCTGGAA 


AACGCAGATA 


GGGTCATAGG 


8340 


TGTT CTGT C A 


CTAGGCCAi 1 




O *± \J \J 


CATTTAATGC 


CAGCATTTAA 


TGCCAGCATT 


8460 


AGTTCAAGGC 


CATCCTGAAT 


TTACATAAAG 


8520 


ACCTTGTCTC 


AAAAAACAAA 


GCATCTTTAG 


8580 


CGGACCCCCC 


ACCCGACGTG 


CACGTGAGCC 


8640 



Fig.3(xvi) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCtD: <WO 981 1225A2J_> 



wo 98/11225 



PCT/GB97/02479 



36/43 



GCGTTGGGGG 


CCTGGAGGAC 


CAGCTGAGTG 


ATTTCCTCTT 


CCAAGCCAAG 


TACCAGATCC 


AGGTGCCCGT 


CCCGCCCCGG 


ACCCGCCCCT 


CACCGTGCAG 


GTGGTGGATG 


ACGTCAGCAA 


GCCCGGCACC 


GTTTACTTCG 


TCCAAGTGCG 


AAAGGCGGGA 


ATCTGGAGCG 


AGTGGAGCCA 


TGAGCACCTC 


TCCAGGGCTG 


GCTGGCCCAT 


CCCACCCTTT 


TTTTGAGACA 


GCGTCTTCAG 


TAGTCAAGGA 


TGACCTCGAG 


CTCCTGGTCT 


GGCCATCACC 


ACCTTTGGGA 


GACTAGCCAT 


GATGGAGTAC 


AACAGTGTGA 


CCTCTTGTAA 


AATATCCTAG 


GCTCTCTAGA 


GGTTAACTTT 


TCACATGGTC 


CCACAGAACC 


TTTTGTCACA 


CACATAAGGG 


TCTCTACTGC 


TG(jCUL.AC-CC^ 


CTTAATATTT 


GCAATCCTCC 


TACCTCAGCC 


CAAGTTTCTC 


TTCTCTGGGT 


CCCTTTCTTA 


GTCCTGAAGA 


CTCTCCGAGC 


CCATGGATCT 


AATGTCTGGC 


CTCAGTTTCC 


CCACCTGTCA 



Fig.3(xvii) 



BNSDOCID: <WO 9ai122SA2J_> 



SUBSTITUTE SHEET (RULE 26) 



wo 98711225 



PCT/GB97/02479 



37/43 



TGCGCTGGGT 


CT C AC C AC C A 


GC 1 C i CAAGG 


Q "7 n n 
o / u u 


GCTACCGCGT 


GGAGGACACjjC 


G i GGAv- 1 vjjL^A 


p 7 ^ n 


GACCCCGCCC 


CCCGCATCTG 


ACTCG i GGG i 


Q Q o n 
o t5 Z U 


C C AG AC CT C C 


TGCCGTCTCG 


GGGGGG i GAA 


Q Q Q n 
O O O U 


TTGTAAC C C A 


trill* /-^ /~i /~i T\ rn/^*T* 

TTCGGGATC 1 


A i GGLj i L.oAA 


Q Q A n 
O Z7 *4 U 


CCCCACCGCT 


GCCTCCACCC 


CTCGAAG i GG 


Q A n n 
y U U U 


GGAATCCCCA 


ATCCATCCTG 


TTCCTTCCCC 


o r\ /^r 
y U b U 


GTAGCGCATG 


CTGGCCTTAA 


ATTCAGTA TG 




TTTTGTCTCC 


ACTTAGAGAC 


AATGGCCAGj. 


y lo u 


GGAGTCTATT 


TAGCCTGTCA 


TTTGGTGACA 


y^4 U 


GAGAACTGAA 


GACAGGCTGT 


TTTT AAC C C C 


Q o rv A 
y J U U 


ATATAAAATA 


G AGAC T ATTA 


TV A /^M " 1 > A 

CAGGGAGi lA 


y J o u 


CAACCTATAG 


ACCACAGTGC 


CTGTGCCTAC 


y4z u 


CTCCAACCCT 


TAAAAGGTAA 


CCTAGGCAGC 


9480 


TCTTGAATGC 


TCAGAAACCA 


GGCATTAACC 


9540 


AGGTGGGAGG 


GCCTAA?^GAT 


GACTTCCTTT 


9600 


GCACTCTCTA 


ATATGAAATA 


TATTGCATAA 


9660 


GGTTTAGGCA 


GCACAGTCGG 


TCCAAGACAC 


9720 



Fig.3(xviii) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9811225A2J_> 



wo 98/11225 



PCT/GB97/02479 



38/43 



TTCATTATTT 


GCAGGCAGTA 


TAAGAAGAAG 


CTAAGACAGA 


ATACTTCTAC 


ACTGAAACTG 


TGATGATGAA 


ATAATGGGGA 


AACTGAGGCT 


ACCAGCTCCA 


GGAAGCTCTC 


CAGCCCCCAT 


GAGTGAACAC 


AGCTGGGAGG 


GGCTGGAGCC 


ACCTGCGATT 


CTTGCACGGG 


AGCCAGCAGG 


CCGGGGGTAG 


GGTTGGAGGG 


AGGTAAGCAG 


CCTGTCAGCG 


AGTCCCCAGT 


TTTATTTATG 


TGCTGGGGGA 


TGGCTGCGGC 


TGGGGATTGG 


CAGCCCACTC 


CATGTCACAC 


CCGTGCATTC 


TTCTGTGCTG 


TCTGTCTCTA 


TTTCTGTCAT 


TTAATATAAC 


TACGTTTTAA 


AAATTGCTTT 


GTGCCACAAC 


ACACACGTGA 


AGGTTAGAGA 


GGGACTAGGG 


CTGGCGACAA 


GAGCAATTAC 


CTTCCCATCC 


TGTTTGGATA 


GTCATAGGTA 


TAGCTATCCT 


GCCTCAGCCT 


ACCAAGTGCT 


TCCCAGTGTC 


TGGGGGTACA 


CAGTCCCAAG 


TGCCCCTTGC 


TTTGTCCGTG 


TCCCTAGAGT 



Fig.3(xix) 

SUBSTfTUTE SHEET (RULE 26) 

BNSDOCID: <W0 ^981 1225A2J_> 



WD 98/11225 



PCT/GB97/02479 



39/43 



CTCCCATCCC 


CCACCCGCTT 


CCTCCGGTCC 


9780 


AACTCTCGCA 


GACGCATATG 


CTCACTTTAA 


9840 


CCGAGAGATT 


CCTGGAGGAA 


GAGGGTCAAA 


9900 


CCGGGCCTCT 


CCAGGTTCTG 


GGCTTGGCGG 


9960 


TGGGAGCTTT 


GGCCCTTGCT 


CGTGCCCAGC 


10020 


CGGCTGCGTC 


CGCCCGAGAG 


ACTGAAGAAG 


10080 


GGGCTGTGGG 


GGCCGAAGCT 


TGTGCCAGGG 


10140 


GCGTGAGGCC 


GATGTCCTTA 


TCCGCTGGCC 


10200 


ACCCAAGGGC 


TGGCTTCCCA 


CTCAGTCCTC 


10260 


TCTGAGGCTT 


ATCTTGGGAA 


CCCGCCCTTG 


10320 


TCACTTTCCC 


AGAGCCTTTT 


TTTTATGCTT 


10380 


TGTATAATGT 


GTGTGCCTTC 


GTGAGCGTGC 


10440 


ACTTTGTTGA 


GTAGGCTCCT 
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5 haemopoietin receptor or derivatives thereof and to 

genetic sequences encoding same. Interaction between 
the novel receptor of the present invention and a ligand 
facilitates proliferation, differentiation and survival 
of a wide variety of cells. The novel receptor and its 
10 derivatives and the genetic sequences encoding same of 

the present invention are useful in the development of a 
wide range of agonists, antagonists, therapeutics and 
diagnostic reagents based on ligand interaction with its 
receptor . 

15 

Bibliographic details of the publications numerically 
i^eferred to in this specification are collected at the 
end of the description. Sequence Identity Numbers (SEQ 
ID NOs . ) for the nucleotide and amino acid sequences 
20 referred to in the specification are defined following 
the bibliography. 

Throughout this specification and the claims which 
follow, unless the context requires otherwise, the word 
25 "comprise", or variations such as "comprises" or 

"comprising", will be understood to imply the inclusion 
of a stated integer or group of integers but not the 
exclusion of any other integer or group of integers. 

3 0 The rapidly increasing sophistication of recombinant DNA 
techniques is greatly facilitating research into the 
medical and allied health fields. Cytokine research is 
of particular importance, especially as these molecules 
regulate the proliferation, differentiation and function 

3 5 of a wide variety of cells. Administration of 

recombinant cytokines or regulating cytokine function 
and/or synthesis is becoming increasingly the focus of 
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medical research into the treatment of a range of 
disease conditions . 

Despite the discovery of a range of cytokines and other 
5 secreted regulators of cell function, comparatively few 
cytokines are directly used or targeted in therapeutic 
regimens. One reason for this is the pleiotropic nature 
of many cytokines. For example, interleukin (XL) -11 is 
a functionally pleiotropic molecule (1,2), initially 

10 characterized by its ability to stimulate proliferation 
of the IL- 6 -dependent plasmacytoma cell line. Til 65 
(3). Other biological actions of IL-11 include 
induction of multipotent ial haemopoiet in progenitor cell 
proliferation (4,5,6), enhancement of megakaryocyte and 

15 platelet formation (7,8,9,10), stimulation of acute 

phase protein synthesis (11) and inhibition of adipocyte 
lipoprotein lipase activity (12, 13). 

Other important cytokines in the IL-11 group include IL- 
20 6, leukaemia inhibitory factor (LIF) , oncostatin M (OSM) 
and CNTF. All these cytokines exhibit pleiotropic 
properties with significant activities in proliferation, 
differentiation and survival of cells. Members of the 
haemopoietin receptor family are defined by the presence 
25 of a conserved amino acid domain in their extracellular 
region. However, despite the low level of amino acid 
sequence conseirvat ion between other haemopoietin 
receptor domains of different receptors, they are all 
predicted to assume a similar tertiary structure, 
30 centred around two f ibronectin- type III repeats (18,19). 

The size of the haemopoietin receptor family has now 
become extensive and includes the cell surface receptors 
for may cytokines including interleukin-2 (IL-2) , IL-3, 
35 IL-4, IL-5, IL-6, IL-7, IL-9, IL-11, IL-12, IL-13, IL- 

15, granulocyte colony stimulating factor (G-CSF) , 
granulocyte-macrophage-CSF (GM-CSF) , erythropoietin, 
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thrombopoietin, leptin, leukaemia inhibitory factor, 
oncostatin-M, ciliary neurotrophic factor, 
cardiotrophin, growth hormone and prolactin. Although 
most of the members of the haemopoietin receptor family 
5 act as classic cell surface receptors, binding their 
cognate ligand at the cell surface and initiating 
intracellular signal transduction, some receptors are 
also produced in naturally occuuring soluble forms . 
These soluble receptors can either act as cytokine 

10 antagonists, by binding to cytokines and inhibiting 

productive interactions with cell surface receptors (eg 
LIF binding protein; (20) or as agonists, binding to 
cytokine and potentiating interaction with cell surface 
receptor components (eg soluble interleukin- 6 receptor 

15 a-chain; (21) . Still other members of the family appear 
to be produced only as secreted proteins, with no 
evidence of a cell surface form. In this regard, the 
IL-12 p4 0 subunit is a useful example. The cytokine IL- 
12 is secreted as a heterodimer composed of a p3 5 

2 0 subunit which shows similarity to cytokines such as IL-6 

(22) and a p40 subunit which shares similarity with the ^ 
IL-6 receptor a-chain (23) . In this case the soluble 
receptor acts as part of the cytokine itself and 
essential to formation of an active protein. In 

25 addition to acting as cytokines (eg IL-12p40), cytokine 
agonists (eg IL-6 receptor a-chain) or cytokine 
antagonists (LIF binding protein) , members of the 
haemopoietin receptor have been useful in the discovery 
of small molecule cytokine mimetics . For example, the 

30 discovery of peptide mimetics of two commercially 

valuable cytokines, erythropoietin and thrombopoietin, 
centred on the selection of peptides capable of binding 
to soluble versions of the erythropoietin and 
thrombopoietin receptors (24,25). Due to the importance 

3 5 and multifactorial nature of these cytokines, there is a 

need to identify receptors, including both cell bound 
and soluble, for pleiotropic cytokines. Identification 
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10 



25 



35 



of such receptors permits the identification of 
pleiotropic cytokines and the development of a range of 
therapeutic and diagnostic agents. 

Accordingly, one aspect of the present invention relates 
to a nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence 
encoding a novel haemopoietin receptor or a derivative 
thereof . 



More particularly, the present invention provides a 
nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence 
encoding a novel haemopoietin receptor or a derivative 
15 thereof having the motif: 

Trp Ser Xaa Trp Ser [SEQ ID N0:1], 
wherein Xaa is any amino acid and is preferably Asp or 
Glu. 

20 Even more particularly, the present invention is 
directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a 
sequence encoding a novel haemopoietin receptor or a 
derivative thereof, said receptor comprising the motif: 



Trp Ser Xaa Trp Ser [SEQ ID NO:ll 



wherein Xaa is any amino acid and is preferably Asp or 
Glu, said nucleic acid molecule is identifiable by 
30 hybridisation to said molecule under low stringency 
conditions at 42EC with 

5N (A/G)CTCCA(A/G)TC(A/G)CTCCA 3N [SEQ ID NO: 7] 
and 

5N (A/G)CTCCA{C/T)TC(A/G)CTCCA 3N [SEQ ID N0:8]. 



Still more particularly, the present invention provides 
an isolated nucleic acid molecule comprising a sequence 
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of nucleotides substantially as set forth in SEQ ID 
NO: 12 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 12 or a nucleotide sequence capable of hybridising 
5 thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 
haemopoietin receptor or a derivative thereof. 

In a related embodiment, the present invention provides 
10 an isolated nucleic acid molecule comprising a sequence 
of nucleotides substantially as set forth in SEQ ID 
NO: 14 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 14 or a nucleotide sequence capable of hybridising 
15 thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 
haemopoietin receptor or a derivative thereof. 

In another related embodiment, the present invention 
2 0 provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides substantially as set forth in 
SEQ ID NO: 16 or a nucleotide sequence having at least 
60% similarity to the nucleotide sequence set forth in 
SEQ ID NO: 16 or a nucleotide sequence capable of 

2 5 hybridising thereto under low stringency conditions at 

42EC and wherein said nucleotide sequence encodes a 
novel haemopoietin receptor or a derivative thereof. 

In a further related embodiment, the present invention 
30 provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides substantially as set forth in 
SEQ ID NO: 18 or a nucleotide sequence having at least 
60% similarity to the nucleotide sequence set forth in 
SEQ ID NO: 18 or a nucleotide sequence capable of 

3 5 hybridising thereto under low stringency conditions at 

42EC and wherein said nucleotide sequence encodes a 
novel haemopoietin receptor or a derivative thereof. 
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In yet a further related embodiment, the present 
invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides substantially as 
set forth in SEQ ID NO: 24 or a nucleotide sequence 
5 having at least 60% similarity to the nucleotide 

sequence set forth in SEQ ID NO: 24 or a nucleotide 
sequence capable of hybridising thereto under low 
stringency conditions at 42EC and wherein said 
nucleotide sequence encodes a novel haemopoietin 
10 receptor or a derivative thereof. 

Still yet a further embodiment of the present invention 
is directed to a sequence of nucleotides substantially 
as set forth in SEQ ID NO: 28 or a nucleotide sequence 

15 having at least 60% similarity to the nucleotide 

sequence set forth in SEQ ID NO: 28 or a nucleotide 
sequence capable of hybridising thereto under low 
stringency conditions at 42EC and wherein said 
nucleotide sequence encodes a novel haemopoietin 

20 receptor or a derivative thereof. 

In still yet another embodiment, the present invention 
provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides substantially set forth in SEQ 

25 ID NO: 38 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ 
ID NO: 38 or a nucleotide sequence capable of hybridising 
thereto under low stringency conditions at 42EC and 
wherein said nucleotide sequence encodes a novel 

30 haemopoietin receptor or a derivative thereof. 

The term "receptor" is used in its broadest sense and 
includes any molecule capable of binding, associating or 
otherwise interacting with a ligand. Generally, the 
35 interaction will have a signalling effect although the 
present invention is not necessarily so limited. For 
example, the "receptor" may be in soluble form, often 
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referred to as a cytokine binding protein. A receptor 
may be deemed a receptor notwithstanding that its ligand 
or ligands has or have not been identified. 

5 Preferably, the novel receptor is derived from a mammal 
or a species of bird. Particularly, preferred mammals 
include humans, primates, laboratory test animals (e.g. 
mice, rats, rabbits, guinea pigs) , livestock animals 
(e.g. sheep, horses, pigs, cows), companion animals 
10 (e.g. dogs, cats) or captive wild animals (e.g. deer, 

foxes, kangaroos) . Although the present invention is 
exemplified with respect to mice, the scope of the 
subject invention extends to all animals and in 
particular humans . 

15 

The present invention is predicated in part on an 
ability to identify members of the haemopoietin receptor 
family with limited sequence similarity. Based on this 
approach, a genetic sequence has been identified in 

20 accordance with the present invention which encodes a 
novel receptor. The expressed genetic sequence is 
referred to herein as "NR6" . Different forms of NR6 are 
referred to as , for example , NR6 . 1 , NRG . 2 and NR6 . 3 . 
The nucleotide and corresponding amino acid sequences 

25 for these molecules are represented in SEQ ID NOs : 12 , 14 
and 16, respectively. 

Preferred human and murine nucleic acid sequences for 
NR6 or its derivatives include sequences from brain, 
30 liver, kidney, neonatal, embryonic, cancer or tumour- 
derived tissues. 

Reference herein to a low stringency at 42EC includes 
and encompasses from at least about 1% v/v to at least 
35 about 15% v/v formamide and from at least about IM to at 
least about 2M salt for hybridisation, and at least 
about IM to at least about 2M salt for washing 
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conditions. Alternative stringency conditions may be 
applied where necessary, such as medium stringency, 
which includes and encompasses from at least about 16% 
v/v to at least about 3 0% v/v formamide and from at 
5 least about 0 . 5M to at least about 0 . 9M salt for 

hybridisation, and at least about 0 . 5M to at least about 
0 . 9M salt for washing conditions, or high stringency, 
which includes and encompasses from at least about 31% 
v/v to at least about 50% v/v formamide and from at 
10 least about O.OIM to at least about 0.15M salt for 
hybridisation, and at least about O.OIM to at least 
about 0.15M salt for washing conditions. 

The nucleic acid molecules contemplated by the present 
15 invention are generally in isolated form and are 
preferably cDNA or genomic DNA molecules. In a 
particularly preferred embodiment, the nucleic acid 
molecules are in vectors and most preferably expression 
vectors to enable expression in a suitable host cell. 
20 Particularly useful host cells include prokaryotic 

cells, mammalian cells, yeast cells and insect cells. 
The cells may also be in the form of a cell line. 

Accordingly, another aspect of the present invention 
25 provides an expression vector comprising a nucleic acid 
molecule encoding the novel haempoietin receptor or a 
derivative thereof as hereinbefore described, said 
expression vector capable of expression in a selected 
host cell . 

30 

Another aspect of the present invention contemplates a 
method for cloning a nucleotide sequence encoding NR6 or 
a derivative thereof, said method comprising searching a 
nucleotide data base for a sequence which encodes the 
3 5 amino acid sequence set forth in SEQ ID NO:l, designing 
one or more oligonucleotide primers based on the 
nucleotide sequence located in the search, screening a 
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nucleic acid library with said one or more 
oligonucleotides and obtaining a clone therefrom which 
encodes said NR6 or part thereof . 

5 Once a novel nucleotide sequence is obtained as 

indicated above encoding NR6 , oligonucleotides may be 
designed which bind cDNA clones with high stringency. 
Direct colony hybridisation may be employed or PGR 
amplification may be used. The use of oligonucleotide 
10 primers which bind under conditions of high stringency 
ensures rapid cloning of a molecule encoding the novel 
NR6 and less time is required in screening out cloning 
artefacts. However, depending on the primers used, low 
or medium stringency conditions may also be employed. 

15 

Alternatively, a library may be screened directly such 
as using oligonucleotides set forth in SEQ ID NO: 7 or 
SEQ ID NO: 8 or a mixture of both oligonucleotides may be 
used. In addition, one or more of oligonucleotides 
20 defined in SEQ ID NO : 2 to 11 may also be used. 

Preferably, the nucleic acid library is a cDNA, genomic, 
cDNA expression or mRNA library . 

25 Preferably, the nucleic acid library is a cDNA 
expression library. 

Preferably, the nucleotide data base is of human or 
murine origin and of brain, liver, kidney, neo-natal 
30 tissue, embryonic tissue, tumour or cancer tissue 
origin. 

Preferred percentage similarities to the reference 
nucleotide sequences include at least about 70%, more 
35 preferably at least about 80%, still more preferably at 
least about 90% and even more preferably at least about 
95% or above. 
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Another aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a novel haempoietin receptor or 
derivative thereof having an amino acid sequence as set 
5 forth in SEQ ID NO: 13 or having at least about 50% 
similarity to all or part thereof. 

Still yet another aspect of the present invention 
provides an isolated nucleic acid molecule comprising a 
10 sequence of nucleotides encoding a novel haempoietin 
receptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 15 or having at least 
about 50% similarity to all or part thereof. 

15 Even yet another aspect of the present invention 

provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides encoding a novel haempoietin 
i^eceptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 17 or having at least 

20 about 50% similarity to all or part thereof. 

A further aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a novel haempoietin receptor or 
25 derivative thereof having an amino acid sequence as set 
forth in SEQ ID NO: 19 or having at least about 50% 
similarity to all or part thereof. 

Even yet a another aspect of the present invention 
30 provides an isolated nucleic acid molecule comprising a 
sequence of nucleotides encoding a novel haempoietin 
receptor or derivative thereof having an amino acid 
sequence as set forth in SEQ ID NO: 25 or having at least 
about 50% similarity to all or part thereof. 

35 

Another aspect of the present invention provides an 
isolated nucleic acid molecule comprising a sequence of 
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nucleotides encoding a novel haempoietin receptor or 
derivative thereof having an amino acid sequence as set 
forth in one or more of SEQ ID N0s:29 or having at least 
about 50% similarity to all or part thereof. 

5 

Preferably, the percentage amino acid similarity is at 
least about 60%, more preferably at least about 70%, 
even more preferably at least about 80-85% and still 
even more preferably at least about 90-95% or greater. 

10 

The NR6 polypeptide contemplated by the present 
invention includes, therefore, derivatives which are 
components, parts, fragments, homologues or analogues of 
the novel haempoietin receptors which are preferably 

15 encoded by all or part of a nucleotide sequences 

substantially set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 25 or 20 or 24 or 28 or 38 or a molecule having at 
least about 60% nucleotide similarity to all or part 
thereof or a molecule capable of hybridising to the 

20 nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 

16 or 18 or 2 0 or 24 or 2 8 or 3 8 or a complementary form 
thereof. The NR6 molecule may be glycosylated or non- 
glycosylated. When in glycosylated form, the 
glycosylation may be substantially the same as naturally 

2 5 occurring haempoietin receptor or may be a modified form 

of glycosylation. Altered or differential glycosylation 
states may or may not affect binding activity of the 
novel receptor. 

3 0 The NR6 haemopoietin receptor may be in soluble form or 

may be expressed on a cell surface or conjugated or 
fused to a solid support or another molecule. 

As stated above, the present invention further 
35 contemplates a range of derivatives of NR6 . Derivatives 
include fragments, parts, portions, mutants, homologues 
and analogues of the NR6 polypeptide and corresponding 
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genetic sequence. Derivatives also include single or 
multiple amino acid substitutions, deletions and/or 
additions to NR6 or single or multiple nucleotide 
substitutions, deletions and/or additions to the genetic 
5 sequence encoding NR6 . "Additions" to amino acid 

sequences or nucleotide sequences include fusions with 
other peptides, polypeptides or proteins or fusions to 
nucleotide sequences. Reference herein to ANR6" 
includes reference to all derivatives thereof including 
10 functional derivatives or NR6 immunologically 
interactive derivatives . 

Analogues of NR6 contemplated herein include, but are 
not limited to, modification to side chains, 
15 incorporating of unnatural amino acids and/or their 
derivatives during peptide, polypeptide or protein 
synthesis and the use of crosslinkers and other methods 
which impose conformational constraints on the 
proteinaceous molecule or their analogues. 

20 

Examples of side chain modifications contemplated by the 
present invention include modifications of amino groups 
such as by reductive alkylation by reaction with an 
aldehyde followed by reduction with NaBH4 ; amidination 

25 with methylacetimidate; acylation with acetic anhydride; 
carbamoylation of amino groups with cyanate; 
trinitrobenzylation of amino groups with 2, 4, 6- 
trinitrobenzene sulphonic acid (TNBS) ; acylation of 
amino groups with succinic anhydride and 

3 0 tetrahydrophthalic anhydride; and pyridoxylation of 

lysine with pyridoxal - 5 -phosphate followed by reduction 
with NaBH4 . 

The guanidine group of arginine residues may be modified 
35 by the formation of heterocyclic condensation products 

with reagents such as 2 , 3 -butanedione , phenylglyoxal and 
glyoxal . 

- 12 - 
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The carboxyl group may be modified by carbodiimide 
activation via O-acylisourea formation followed by 
subsequent derivitisation, for example, to a 
corresponding amide . 

5 

Sulphydryl groups may be modified by methods such as 
carboxymethylation with iodoacetic acid or 
iodoacetamide ; performic acid oxidation to cysteic acid; 
formation of a mixed disulphides with other thiol 

10 compounds; reaction with maleimide, maleic anhydride or 
other substituted maleimide; formation of mercurial 
derivatives using 4 -chloromercuribenzoate , 4- 
chloromercuriphenylsulphonic acid, phenylmercury 
chloride, 2 -chloromercuri -4 -nitrophenol and other 

15 mercurials; carbamoylat ion with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, 
oxidation with N-bromosuccinimide or alkylation of the 
indole ring with 2 -hydroxy- 5 -nitrobenzyl bromide or 
20 sulphenyl halides. Tyrosine residues on the other hand, 
may be altered by nitration with tetranitromethane to 
form a 3 -nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine 
25 residue may be accomplished by alkylation with 

iodoacetic acid derivatives or N-carbethoxylation with 
diethylpyrocarbonate . 

Examples of incorporating unnatural amino acids and 
30 derivatives during peptide synthesis include, but are 

not limited to, use of norleucine, 4 -amino butyric acid, 
4-amino-3 -hydroxy-5-phenylpentanoic acid, 6- 
aminohexanoic acid, t -butylglycine , norvaline, 
phenylglycine, ornithine , sarcosine , 4 -amino- 3 -hydroxy - 
35 6-methylheptanoic acid, 2-thienyl alanine and/or D- 

isomers of amino acids. A list of unnatural amino acid, 
contemplated herein is shown in Table 1. 

- 13 - 
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These types of modifications may be important to 
stabilise NR6 if administered to an individual or for 
use as a diagnostic reagent. 

5 Crosslinkers can be used, for example, to stabilise 3D 

conformations, using homo-bif unct ional crosslinkers such 
as the bifunctional imido esters having (CH2)n spacer 
groups with n=l to n=6, glutaraldehyde , N- 
hydroxysuccinimide esters and hetero-bif unct ional 

10 reagents which usually contain an amino -reactive moiety 
such as N -hydroxy sue cinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) 
or carbodiimide (COOH) . In addition, peptides can be 
conformat ional ly constrained by, for example, 

15 incorporation of C" and N „ -methylamino acids, 

introduction of double bonds between C and C5 atoms of 
amino acids and the formation of cyclic peptides or 
analogues by introducing covalent bonds such as forming 
an amide bond between the N and C termini, between two 

20 side chains or between a side chain and the N or C 
terminus . 
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TABLE 1 



Non - convent i ona 1 
amino acid 



Code Non -conventional 

amino acid 



Code 



aminobutyric acid 
Amino- " -methylbutyrate 
aminocyclopropane- 
carboxylate 

10 aminoisobutyric acid 

aminonorborny 1 - 
carboxylate 
cyclohexylalanine 
cyclopentylalanine 

15 D-alanine 
D-arginine 
D-aspartic acid 
D-cysteine 
D-glutamine 

2 0 D-glutamic acid 

D-histidine 
D-isoleucine 
D-leucine 
D- lysine 
25 D-methionine 
D-ornithine 
D -phenylalanine 
D-proline 
D-serine 

3 0 D-threonine 

D- tryptophan 
D- tyrosine 
D-valine 

D- " -methylalanine 
3 5 D - " -methylarginine 

D-" -methylasparagine 
D - " -methylaspartate 



Abu 

Mgabu 

Cpro 

Aib 
Norb 



Open 

Dal 

Darg 

Dasp 

Dcys 

Dgln 

Dglu 

Dhis 

Dile 

Dleu 

Dlys 

Dmet 

Dorn 

Dphe 

Dpro 

Dser 

Dthr 

Dtrp 

Dtyr 

Dval 

Dmala 

Dmarg 

Dmasn 

Dmasp 



L-N-methylalanine Nmala 

L-N-methylarginine Nmarg 

L-N-methylasparagine Nmasn 

L-N-methylaspartic acid Nmasp 

L-N-methylcysteine Nmcys 

L-N-methylglutamine Nmgln 

L-N-methylglutamic acid Nmglu 

ChexaL-N-methylhistidine Nmhis 

L-N-methylisolleucine Nmile 

Li-N-methylleucine Nmleu 

L-N-methyllysine Nmlys 

L-N-methylmethionine Nmmet 

L-N-methylnorleucine Nmnle 

L-N-methylnorvaline Nmnva 

L-N-methylornithine Nmorn 

L-N-methylphenylalanine Nmphe 

L-N-methylproline Nmpro 

L-N-methyl serine Nmser 

L -N-me thy 1 threonine Nmthr 

L-N-methyl tryptophan Nmtrp 

L-N-methyltyrosine Nmt y r 

L-N-methylvaline Nraval 

L -N-me thy le thy Iglycine Nmetg 

L-N-methy].-t-butylglycine Nmt bug 

L-norleucine Nle 

L- nerval ine Nva 

" -methyl -aminoisobutyrate Maib 

" -methyl - ( -aminobutyrate Mgabu 

" -methylcyclohexylalanine Mchexa 
" -methylcylcopentylalanine Mcpen 

" -methyl - " -napthylalanine Manap 

" -methylpenicillamine Mpen 



15 
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D- " -methylcysteine 
D- " -methylglutamine 
D- " -methylhistidine 
D- " -methylisoleucine 
5 D- " -methylleucine 

D- *' -methyllysine 
D- " -methylmethionine 
D- " -methylorni thine 
D- " -methylphenylalanine 
10 D- "-methylproline 

D- " -methylserine 
D" -methylthreonine 
D- " -methyl tryptophan 
D- " -methyl tyrosine 
15 D-"-methylvaline 
D-N-methylalanine 
D-N-methylarginine 
D-N-methylasparagine 
D-N-methylaspartate 
2 0 D-N-methylcysteine 
D-N-methylglutamine 
D-N-methylglutamate 
D-N-methylhistidine 
D-N-methylisoleucine 

2 5 D-N-methylleucine 

D - N - me thy 1 ly s ine 
N-methylcyclohexylalanine 
D-N-methylorni thine 
NmcpenN-methylglycine 

3 0 N-methylaminoisobutyrate 

N- (1-methylpropyl) glycine 
N- ( 2 -methylpropyl) glycine 
D-N-me thy 1 tryptophan 
D-N-me thy 1 tyrosine 
3 5 D-N-methylvaline 

( -aminobutyric acid 
L- t-butylglycine 



Dmcys 
Dmgln 
Dmhis 
Dmile 
Dmleu 
Dmlys 
Dmmet 
Dmorn 
Dmphe 
Dmpro 
Dmser 
Dmthr 
Dmtrp 
Dmty 
Dmval 
Dnmala 
Dnmarg 
Dnmasn 
Dnmasp 
Dnmcys 
Dnmgln 
Dnmglu 
Dnmhis 
Dnmile 
Dnmleu 
Dnmlys 
Nmchexa 
Dnmorn 
Nala 
Nmaib 
Nile 
Nleu 
Dnmtrp 
Dnmtyr 
Dnmval 
Gabu 
Tbug 



N- (4 -aminobutyl) glycine Nglu 
N- ( 2 -aminoethyl) glycine Naeg 
N- (3-aminopropyl) glycine Norn 
N-amino- " -methylbutyrate Nmaabu 
" -napthylalanine Anap 
N-benzylglycine Nphe 
N- (2 -carbamylethyl) glycine Ngln 
N- (carbamylmethyl) glycine Nasn 
N- C 2 -carboxye thy 1) glycine Nglu 
N- (carboxymethyl) glycine Nasp 
N-cyclobutylglycine Ncbut 
N-cycloheptylglycine Nchep 
N-cyclohexylglycine Nchex 
N-cyclodecylglycine Ncdec 
N-cylcododecylglycine Ncdod 
N-cyclooctylglycine Ncoct 
N-cyclopropylglycine Ncpro 
N-cycloundecylglycine Ncund 
N- (2, 2-diphenylethyl)glycine Nbhm 
N- ( 3 , 3 -diphenylpropy 1 ) glycine Nbhe 
N- ( 3 -guanidinopropyl) glycine Narg 
N- (l-hydroxyethyl) glycine Nthr 
N- (hydroxyethyl) ) glycine Nser 
N- (imidazolylethyl) ) glycine Nhis 
N- (3 -indolylyethyl) glycine Nhtrp 
N -methyl - ( -aminobutyrate Nmgabu 
D-N-methylmethionine Dnmmet 
N -methyl cyclopentylalanine 
D-N-methylphenylalanine Dnmphe 
D-N-methylproline Dnmpro 
D-N-me thy Iser ine Dnmser 
D-N-methylthreonine Dnmthr 
N- (1-methylethyl) glycine Nval 
N-methy la -napthylalanine Nmanap 
N-methylpenicillamine Nmpen 
N- (p-hydroxyphenyl) glycine Nhtyr 
N- (thiomethyl) glycine Ncys 
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L-ethylglycine 


Etg 


penicillamine 


Pen 




L - homophe nylalanine 


Hphe 


L- " -methylalanine 


Mala 




L- " -methylarginine 


Marg 


L- " -methylasparagine 


Masn 




L- " -methylaspartate 


Masp 


-methyl- t-butylglycine 


Mtbug 


5 


L- " -methylcysteine 


Mcys 


L-methylethylglycine 


Metg 




L- " -methylglutamine 


Mgln 


L- " -methylglutamate 


Mglu 




L- -methylhistidine 


Mhis 


L - " -methylhomophenylalani] 


ae Mhphe 




L- " -methylisoleucine 


Mile 


N- { 2 -methylthioethyl) glycine Nmet 




L- " -methylleucine 


Mleu 


L- " -methyllysine 


Mlys 


10 


L- " -methylmethionine 


Mmet 


L- " -methylnorleucine 


Mnle 




L- " -methylnorvaline 


Mnva 


L- " -methylornithine 


Morn 




L- " -methylphenylalanine 


Mphe 


L- " -methylproline 


Mpro 




L- " -methylserine 


Mser 


L- " -methylthreonine 


Mthr 




L- " -methyltryptophan 


Mtrp 


L- ti -methyl tyrosine 


Mtyr 


15 


L- " -methylvaline 


Mval 


L-N-methylhomophenylalanine Nmhphe 




N- (N- (2, 2-diphenyiethyl) 


Nnbhm 


N- (N- (3,3 -diphenylpropyl) 


Nnbhe 




carbamylmethyl) glycine 




carbamylmethyl) glycine 






1-carboxy-l- ( 2 , 2 -diphenyl 


- Nmbc 


ethylamino) cyclopropane 





20 



25 



30 



The present invention further contemplates chemical 
analogues of NR6 capable of acting as antagonists or 
agonists of NR6 or which can act as functional analogues 
of NR6 . Chemical analogues may not necessarily be 
derived from NR6 but may share certain conformational 
similarities. Alternatively, chemical analogues may be 
specifically designed to mimic certain physiochemical 
properties of NR6 . Chemical analogues may be chemically 
synthesised or may be detected following, for example, 
natural product screening. 



35 



The identification of NRG permits the generation of a 
range of therapeutic molecules capable of modulating 
expression of NRG or modulating the activity of NR6 • 
I^odulators contemplated by the present invention 
includes agonists and antagonists of NRG expression. 
Antagonists of NRG expression include antisense 
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molecules, ribozymes and co- suppression molecules. 
Agonists include molecules which increase promoter 
ability or interfere with negative regulatory 
mechanisms. Agonists of NR6 include molecules which 
5 overcome any negative regulatory mechanism. Antagonists 
of NRG include antibodies and inhibitor peptide 
fragments . 

Other derivatives contemplated by the present invention 
10 include a range of glycosylation variants from a 
completely unglycosylated molecule to a modified 
glycosylated molecule. Altered glycosylation patterns 
may result from expression of recombinant molecules in 
different host cells. 

15 

Another embodiment of the present invention 
contemplates a method for modulating expression of NR6 
in a subject such as a human or mouse, said method 
comprising contacting the genetic sequence encoding NR6 

20 with an effective amount of a modulator of NR6 

expression for a time and under conditions sufficient to 
up-regulate or down- regulate or otherwise mo(dulate 
expression of NR6 . Modulating NR6 expression provides a 
means of modulating NR6-ligand interaction or NRG 

25 stimulation of cell activities. 

Another aspect of the present invention contemplates a 
method of modulating activity of NRG in a human, said 
method comprising administering to said mammal a 

30 modulating effective amount of a molecule for a time and 
under conditions sufficient to increase or decrease NRG 
activity. The molecule may be a proteinaceous molecule 
or a chemical entity and may also be a derivative of NRG 
or its ligand or a chemical analogue or truncation 

3 5 mutant of NRG or its ligand. 

The present invention, therefore, contemplates a 

- 18 - 



BNSDOCID: <WO ^981 1225A3JA> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



pharmaceutical composition comprising NRG or a 
derivative thereof or a modulator of NR6 expression or 
NRG activity and one or more pharmaceutically acceptable 
carriers and/or diluents. These components are referred 
5 to as the active ingredients. 

The pharmaceutical forms suitable for injectable use 
include sterile aqueous solutions (where water soluble) 
and sterile powders for the extemporaneous preparation 

10 of sterile injectable solutions. It must be stable 

under the conditions of manufacture and storage and must 
be preserved against the contaminating action of 
microorganisms such as bacteria and fungi. The carrier 
can be a solvent or. dilution medium comprising, for 

15 example, water, ethanol, polyol (for example, glycerol, 
propylene glycol and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. 
The proper fluidity can be maintained, for example, by 
the use of superf actants . The preventions of the action 

20 of microorganisms can be brought about by various 
antibacterial and antifungal agents, for example, 
parabens, chlorobutanol , phenol, sorbic acid, 
thirmerosal and the like. In many cases, it will be 
preferable to include isotonic agents, for example, 

25 sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use 
in the compositions of agents delaying absorption, for 
example, aluminum monostearate and gelatin. 

30 Sterile injectable solutions are prepared by 

incorporating the active compounds in the required 
amount in the appropriate solvent with various of the 
other ingredients enumerated above, as required, 
followed by filtered sterilization. In the case of 

35 sterile powders for the preparation of sterile 
injectable solutions, the preferred methods of 
preparation are vacuum drying and the freeze -drying 
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technique which yield a powder of the active ingredient 
plus any additional desired ingredient from previously 
sterile-filtered solution thereof. 

5 When the active ingredients are suitably protected they 
may be orally administered, for example, with an inert 
diluent or with an assimilable edible carrier, or it may 
be enclosed in hard or soft shell gelatin capsule, or it 
may be compressed into tablets, or it may be 
10 incorporated directly with the food of the diet. For 

oral therapeutic administration, the active compound may 
be incorporated with excipients and used in the form of 
ingestible tablets, buccal tablets, troches, capsules, 
elixirs, suspensions, syrups, wafers, and the like. 
15 Such compositions and preparations should contain at 

least 1% by weight of active compound. The percentage 
of the compositions and preparations may, of course, be 
varied and may conveniently be between about 5 to about 
80% of the weight of the unit. The amount of active 
20 compound in such therapeutically useful compositions in 
such that a suitable dosage will be obtained. Preferred 
compositions or preparations according to the present 
invention are prepared so that an oral dosage unit form 
contains between about 0 . 1 ug and 2000 mg of active 
25 compound. Alternative dosage amounts include from about 
1 Fg to about 10 0 0 mg and from about 10 Fg to about 500 
mg . 

The tablets, troches, pills, capsules and the like may 
3 0 also contain the components as listed hereafter: A 
binder such as gum, acacia, corn starch or gelatin; 
excipients such as dicalcium phosphate; a 
disintegrating agent such as corn starch, potato starch, 
alginic acid and the like; a lubricant such as 
35 magnesium stearate; and a sweetening agent such a 
sucrose, lactose or saccharin may be added or a 
flavouring agent such as peppermint, oil of wintergreen, 
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or cherry flavouring. When the dosage unit form is a 
capsule, it may contain, in addition to materials of the 
above type, a liquid carrier. Various other materials 
may be present as coatings or to otherwise modify the 
5 physical form of the dosage unit. For instance, 

tablets, pills, or capsules may be coated with shellac, 
sugar or both. A syrup or elixir may contain the active 
compound, sucrose as a sweetening agent, methyl and 
propylparabens as preservatives, a dye and flavouring 

10 such as cherry or orange flavour. Of course, any 

material used in preparing any dosage unit form should 
be pharmaceutically pure and substantially non-toxic in 
the amounts employed. In addition, the active 
compound (s) may be incorporated into sustained-release 

15 preparations and formulations. 

The present invention also extends to forms suitable for 
topical application such as creams, lotions and gels as 
well as a range of "paints" which are applied to skin 

2 0 and through which the active ingredients are absorbed. 

Pharmaceutically acceptable carriers and/or diluents 
include any and all solvents, dispersion media, 
coatings, antibacterial and antifungal agents, isotonic 
25 and absorption delaying agents and the like. The use of 
such media and agents for pharmaceutical active 
substances is well known in the art and except insofar 
as any conventional media or agent is incompatible with 
the active ingredient, their use in the therapeutic 

3 0 compositions is contemplated. Supplementary active 

ingredients can also be incorporated into the 
compositions . 

It is especially advantageous to formulate parenteral 
3 5 compositions in dosage unit form for ease of 

administration and uniformity of dosage. Dosage unit 
form as used herein refers to physically discrete units 
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suited as unitary dosages for the mammalian subjects to 
be treated; each unit containing a predetermined 
quantity of active material calculated to produce the 
desired therapeutic effect in association with the 
5 required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are 
dictated by and directly dependent on (a) the unique 
characteristics of the active material and the 
particular therapeutic effect to be achieved, and (b) 
10 the limitations inherent in the art of compounding such 
an active material for the treatment of disease in 
living subjects having a diseased condition in which 
bodily health is impaired as herein disclosed in detail . 

The principal active ingredient is compounded for 
convenient and effective administration in effective 
amounts with a suitable pharmaceut ically acceptable 
carrier in dosage unit form as hereinbefore disclosed. 
A unit dosage form can, for example, contain the 
principal active compound in amounts ranging from 0.5 :g 
to about 2000 mg . Expressed in proportions, the active 
compound is generally present in from about 0.5 :g to 
about 2000 mg/ml of carrier. In the case of 
compositions containing supplementary active 
ingredients, the dosages are determined by reference to 
the usual dose and manner of administration of the said 
ingredients . 

Dosages may also be expressed per body weight of the 
recipient. For example, from about 10 ng to about 1000 
mg/kg body weight, from about 100 ng to about 500 mg/kg 
body weight and for about 1 Fg to above 2 50 mg/kg body 
weight may be administered. 

The pharmaceutical composition may also comprise genetic 
3 5 molecules such as a vector capable of transfecting 

target cells where the vector carries a nucleic acid 
molecule capable of modulating NR6 expression or NR6 
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activity. The vector may, for example, be a viral 
vector. 

Still another aspect of the present invention is 
5 directed to antibodies to NR6 and its derivatives. Such 
antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to NR6 or 
may be specifically raised to NR6 or derivatives 
thereof. In the case of the latter, NR6 or its 

10 derivatives may first need to be associated with a 

carrier molecule. The antibodies and/or recombinant NR6 
or its derivatives of the present invention are 
particularly useful as therapeutic or diagnostic agents. 
For example, NR6 antibodies or antibodies to its ligand 

15 may act as antagonists. 

For example, NR6 and its derivatives can be used to 
screen for naturally occurring antibodies to NR6 . These 
may occur, for example in some autoimmune diseases. 

20 Alternatively, specific antibodies can be used to screen 
for NR6 . Techniques for such assays are well known in 
the art and include, for example, sandwich assays and^ 
EIjISA. Knowledge of NRG levels may be important for 
diagnosis of certain cancers or a predisposition to 

25 cancers or for monitoring certain therapeutic protocols - 

Antibodies to NR6 of the present invention may be 
monoclonal or polyclonal. Alternatively, fragments of 
antibodies may be used such as Fab fragments. 

3 0 Furthermore, the present invention extends to 

recombinant and synthetic antibodies and to antibody 
hybrids. A "synthetic antibody" is considered herein to 
include fragments and hybrids of antibodies. The 
antibodies of this aspect of the present invention are 

3 5 particularly useful for immunotherapy and may also be 
used as a diagnostic tool for assessing apoptosis or 
monitoring the program of a therapeutic regimen. 
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For example, specific antibodies can be used to screen 
for NR6 proteins. The latter would be important, for 
example, as a means for screening for levels of NRG in a 
cell extract or other biological fluid or purifying NR6 
5 made by recombinant means from culture supernatant 

fluid. Techniques for the assays contemplated herein 
are known in the art and include, for example, sandwich 
assays and ELISA. 

10 It is within the scope of this invention to include any 
second antibodies (monoclonal, polyclonal or fragments 
of antibodies or synthetic antibodies) directed to the 
first mentioned antibodies discussed above. Both the 
first and second antibodies may be used in detection 

15 assays or a first antibody may be used with a 

commercially available ant i - immunoglobulin antibody. An 
antibody as contemplated herein includes any antibody 
specific to any region of NRG. 

2 0 Both polyclonal and monoclonal antibodies are obtainable 
by immunization with the enzyme or protein and either 
type is utilizable for immunoassays. The methods of 
obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively 

2 5 easily prepared by injection of a suitable laboratory 

animal with an effective amount of NRG, or antigenic 
parts thereof, collecting serum from the animal, and 
isolating specific sera by any of the known 
immunoadsorbent techniques. Although antibodies 
30 produced by this method are utilizable in virtually any 
type of immunoassay, they are generally less favoured 
because of the potential heterogeneity of the product. 

The use of monoclonal antibodies in an immunoassay is 

3 5 particularly preferred because of the ability to produce 

them in large quantities and the homogeneity of the 
product- The preparation of hybridoma cell lines for 
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monoclonal antibody production derived by fusing an 
immortal cell line and lymphocytes sensitized against 
the immunogenic preparation can be done by techniques 
which are well known to those who are skilled in the 
5 art . 

Another aspect of the present invention contemplates a 
method for detecting NR6 in a biological sample from a 
subject said method comprising contacting said 
10 biological sample with an antibody specific for NR6 or 
its derivatives or homologues for a time and under 
conditions sufficient for an antibody-NR6 complex to 
form, and then detecting said complex. 

The presence of NR6 may be accomplished in a number of 
15 ways such as by Western blotting and ELISA procedures. 

A wide range of immunoassay techniques are available as 
can be seen by reference to US Patent Nos . 4,016,043, 4, 
424,279 and 4,0-18,653. These, of course, includes both 
single-site and two-site or »' sandwich" assays of the 
20 non-competitive types, as well as in the traditional 

competitive binding assays. These assays also include 
direct binding of a labelled antibody to a target. 

Sandwich assays are among the most useful and commonly 
used assays and are favoured for use in the present 
invention. A number of variations of the sandwich assay 
technique exist, and all are intended to be encompassed 
by the present invention. Briefly, in a typical forward 
assay, an unlabelled antibody is immobilized on a solid 
substrate and the sample to be tested brought into 
contact with the bound molecule. After a suitable 
period of incubation, for a period of time sufficient to 
allow formation of an antibody-ant igen complex, a second 
antibody specific to the antigen, labelled with a 
reporter molecule capable of producing a detectable 
signal is then added and incubated, allowing time 
sufficient for the formation of another complex of 
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antibody-antigen-labelled antibody. Any unreacted 
material is washed away, and the presence of the antigen 
is determined by observation of a signal produced by the 
reporter molecule. The results may either be 
5 qualitative, by simple observation of the visible 
signal, or may be quantitated by comparing with a 
control sample containing known amounts of hapten. 
Variations on the forward assay include a simultaneous 
assay, in which both sample and labelled antibody are 

10 added simultaneously to the bound antibody. These 

techniques are well known to those skilled in the art, 
including any minor variations as will be readily 
apparent. In accordance with the present invention, the 
sample is one which might contain NR6 including cell 

15 extract, tissue biopsy or possibly serum, saliva, 

mucosal secretions, lymph, tissue fluid and respiratory 
fluid. The sample is, therefore, generally a biological 
sample comprising biological fluid but also extends to 
fermentation fluid and supernatant fluid such as from a 

20 cell culture. 

In the typical forward sandwich assay, a first antibody 
having specificity for the NR6 or antigenic parts 
thereof, is either covalently or passively bound to a 

25 solid surface. The solid surface is typically glass or 
a polymer, the most commonly used polymers being 
eel lulose , polyacrylamide , nylon , polystyrene , polyvinyl 
chloride or polypropylene. The solid supports may be in 
the form of tubes, beads, discs of microplates, or any 

3 0 other surface suitable for conducting an immunoassay. 
The binding processes are well-known in the art and 
generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer -antibody complex is 
washed in preparation for the test sample. An aliquot 

3 5 of the sample to be tested is then added to the solid 
phase complex and incubated for a period of time 
sufficient (e.g. 2-40 minutes or overnight if more 
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convenient) and under suitable conditions (e.g. from 
about room temperature to about 3 71C) to allow binding 
of any subunit present in the antibody. Following the 
incubation period, the antibody subunit solid phase is 
5 washed and dried and incubated with a second antibody 
specific for a portion of the hapten. The second 
antibody is linked to a reporter molecule which is used 
to indicate the binding of the second antibody to the 
hapten , 

10 

An alternative method involves immobilizing the target 
molecules in the biological sample and then exposing the 
immobilized target to specific antibody which may or may 
not be labelled with a reporter molecule. Depending on 

15 the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by 
direct labelling with the antibody. Alternatively, a 
second labelled antibody, specific to the first antibody 
is exposed to the target- first antibody complex to form 

20 a target-first antibody- second antibody tertiary 

complex. The complex is detected by the signal emitted 
by the reporter molecule. 

In another alternative method, the NR6 ligand is 
25 immobilised to a solid support and a biological sample 

containing NR6 brought into contact with its immobilised 
ligand. Binding between NR5 and its ligand can then be 
determined using an antibody to NR6 which itself may be 
labelled with a reporter molecule or a further anti- 
30 immunoglobulin antibody labelled with a reporter 

molecule could be used to detect antibody bound to NRG. 

By "reporter molecule" as used in the present 
specification, is meant a molecule which, by its 
35 chemical nature, provides an analytically identifiable 
signal which allows the detection of antigen-bound 
antibody. Detection may be either qualitative or 
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quantitative. The most commonly used reporter molecules 
in this type of assay are either enzymes, fluorophores 
or radionuclide containing molecules (i.e. 
radioisotopes) and chemiluminescent molecules. 
5 In the case of an enzyme immunoassay, an enzyme is 

conjugated to the second antibody, generally by means of 
glutaraldehyde or periodate. As will be readily 
recognized, however, a wide variety of different 
conjugation techniques exist, which are readily 

10 available to the skilled artisan. Commonly used enzymes 
include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. 
The substrates to be used with the specific enzymes are 
generally chosen for the production, upon hydrolysis by 

15 the corresponding enzyme, of a detectable colour change. 
Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to 
employ fluorogenic substrates, which yield a fluorescent 
product rather than the chromogenic substrates noted 

2 0 above. In all cases, the enzyme- labelled antibody is 

added to the first antibody hapten complex, allowed to 
bind, and then the excess reagent is washed away. A 
solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The 

25 substrate will react with the enzyme linked to the 

second antibody, giving a qualitative visual signal, 
which may be further quant itated, usually 
spectrophotometrically , to give an indication of the 
amount of hapten which was present in the sample. 

30 "Reporter molecule" also extends to use of cell 

agglutination or inhibition of agglutination such as red 
blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein 

3 5 and rhodamine, may be chemically coupled to antibodies 

without altering their binding capacity. When activated 
by illumination with light of a particular wavelength, 
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the fluorochrome- labelled antibody adsorbs the light 
energy, inducing a state to excitability in the 
molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light 
5 microscope. As in the EIA, the fluorescent labelled 

antibody is allowed to bind to the first ant ibody- hapten 
complex. After washing off the unbound reagent, the 
remaining tertiary complex is then exposed to the light 
of the appropriate wavelength the fluorescence observed 
10 indicates the presence of the hapten of interest. 

Immunof luorescene and EIA techniques are both very well 
established in the art and are particularly preferred 
for the present method. However, other reporter 
molecules, such as radioisotope, chemiluminescent or 
15 bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays 
such as involving PCR analysis to detect the NR6 gene or 
its derivatives. Alternative methods or methods used in 
conjunction include direct nucleotide sequencing or 
mutation scanning such as single stranded conformational, 
polymorphisms analysis (SSCP) as specific 

oligonucleotide hybridisation, as methods such as direct 
protein truncation tests . 

The nucleic acid molecules of the present invention may 
be DNA or RNA. When the nucleic acid molecule is in a 
DNA form, it may be genomic DNA or cDNA. RNA forms of 
the nucleic acid molecules of the present invention are 
generally mRNA. 

Although the nucleic acid molecules of the present 
invention are generally in isolated form, they may be 
integrated into or ligated to or otherwise fused or 
35 associated with other genetic molecules such as vector 

molecules and in particular expression vector molecules. 
Vectors and expression vectors are generally capable of 
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replication and, if applicable, expression in one or 
both of a prokaryotic cell or a eukaryotic cell. 
Preferably, prokaryotic cells include E, coli, Bacillus 
sp and Pseudomonas sp. Preferred eukaryotic cells 
5 include yeast, fungal, mammalian and insect cells. 

Accordingly, another aspect of the present invention 
contemplates a genetic construct comprising a vector 
portion and a mammalian and more particularly a human 
10 NR6 gene portion, which NR6 gene portion is capable of 
encoding an NRG polypeptide or a functional or 
immunologically interactive derivative thereof. 

Preferably, the NR6 gene portion of the genetic 
15 construct is operably linked to a promoter on the vector 
such that said promoter is capable of directing 
expression of said NR6 gene portion in an appropriate 
cell . 

20 In addition, the NR6 gene portion of the genetic 

construct may comprise all or part of the gene fused to 
another genetic sequence such as a nucleotide sequence 
encoding maltose binding protein or glutathione-S- 
transf erase or part thereof . 

25 

The present invention extends to such genetic constructs 
and to prokaryotic or eukaryotic cells comprising same. 

The present invention also extends to any or all 
30 derivatives of NR6 including mutants, part, fragments, 
portions, homologues and analogues or their encoding 
genetic sequence including single or multiple nucleotide 
or amino acid substitutions, additions and/or deletions 
to the naturally occurring nucleotide or amino acid 
3 5 sequence. 

NR6 may be important for the proliferation, 
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dif f erentiacion and survival of a diverse array of cell 
types-. Accordingly, it is proposed that NR6 or its 
functional derivatives be used to regulate development, 
maintenance or regeneration in an array of different 
5 cells and tissues in vitro and in vivo. For example, 

NR6 is contemplated to be useful in modulating neuronal 
proliferation, dif f erentat ion and survival. 

Soluble NR6 polypeptides are also contemplated to be 
10 useful in the treatment of a range of diseases, injuries 
or abnormalities. 

Membrane bound or soluble NR6 may be used in vitro on 
nerve cells or tissues to modulate proliferation, 
15 differentiation or survival, for example, in grafting 
procedures or transplantation. 

As stated above, the NR6 of the present invention or its 
functional derivatives may be provided in a 

20 pharmaceutical composition comprising the NR6 together 
with one or more pharmaceut ically acceptable carriers 
and/or diluents. In addition, the present invention 
contemplates a method of treatment comprising the 
administration of an effective amount of a NR6 of the 

25 present invention. The present invention also extends 
to antagonists and agonists of NR6s and their use in 
therapeutic compositions and methodologies. 

A further aspect of the present invention contemplates 
3 0 the use of NR6 or its functional derivatives in the 
manufacture of a medicament for the treatment of NR6 
mediated conditions defective or deficient. 

Still a further aspect of the present invention 
35 contemplates a ligand for NR6 preferably, in isolated or 
recombinant form or a derivative of said ligand. 
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The present invention further contemplates knockout 
animals such as mice or other murine species for the NR6 
gene including homozygous and heterozygous knockout 
animals. Such animals provide a particularly useful 
5 live in vivo model for studying the effects of NR6 as 
well as screening for agents capable of acting as 
agonists or antagonists of NR6 . 

According to this embodiment there is provided a 
10 transgenic animal comprising a mutation in at least one 
allele of the gene encoding NR6 . Additionally, the 
present invention provides a transgenic animal 
comprising a mutation in two alleles of the gene 
encoding NR6 . Preferably, the transgenic animal is a 
15 murine animal such as a mouse or rat. 

The present invention is further described by the 
following non-limiting Figures and Examples. 

2 0 In the Figures: 

Figure 1 is a diagrammatic representation showing 
expansion of sequenced region of the mouse NR6 gene 
indicating splicing patterns seen in the three forms of 
25 NR6 cDNA, NRG . 1 , NRG . 2 and NRG . 3 . 

Figure 2 is a representation of the nucleotide sequence 
of the mouse NRG gene, containing exons encoding the 
cDNA from nucleotide 14 8 encoding D50 of the cDNAs shown 
30 in SEQ ID NOs : 12 and 14 to the end of the 3N 

untranslated region shared by both NRG.l, NRG. 2 and 
NRG . 3 . In this figure, this region encompasses 
nucleotides gll82 to g6617. This sequence is also 
defined in SEQ ID NO: 28. 

35 

Figure 3 is a representation of the nucleotide sequence 
of the mouse genomic NRG gene with additional 5N 
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sequences . The coding exons of NRG span approximately 
llkb of the mouse genome. There are 9 coding exons 
separated by 8 introns : 



exonl 


at least 239nt 


intronl 


5195nt 


exon 2 


282nt 


intron2 


214nt 


exon3 


130nt 


intron3 


107nt 


exon4 


170nt 


intron4 


1372nt 


exonS 


158nt 


intronS 


68nt 


exon6 


169nt 


intron6 


2020nt 


exon 6 


188nt 


intron? 


104nt 


exonS 


43nt 


intronS 


181nt 


exon9 


252nt 







Exon 1 encoding the signal sequence, exon 2 the Ig-like 
15 domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
and 9 are alternatively spliced. 

Figure 4 is a diagrammatic representation showing the 
genomic structure of murine NR-6. 

20 

Figure 5 is a diagrammatic representation showing 
targetting of the NR6 locus by homologous recombination. 
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Single and three letter abbreviations for amino acid 
residues used in the specification are summarised in 
Table 2 : 



TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 



Alanine 


Ala 


A 


Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


He 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 


Any residue 


Xaa 


X 
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TABLE 3 
SUMMARY OF SEQ ID NO, 



Sequence SEQ ID NO. 

5 Amino acid sequence WSXWS 1 
Oligonucleotide primers and probes listed 

in Example 1 2-11 

Nucleotide sequence of NR6.1^ 12 

Amino acid sequence of NR6 . 1 13 

10 Nucleotide sequence of NR6.2^ 14 

Amino acid sequence of NR6 , 2 15 

Nucleotide sequence of NR6.3^ 16 

Amino acid sequence of NR6 . 3 17 
Nucleotide sequence of products generated 
15 by 5N RACE of brain cDNA using NR6 

specific primers'^ 18 

Amino acid sequence of SEQ ID NO: 18 19 
Nucleotide sequence unique to 5N RACE of 

brain cDNA 2 0 

2 0 Amino acid sequence for SEQ ID NO: 20 21 

Unspliced murine NRG nucleotide sequence 22 

PCR product for human NR6 23 
Nucleotide sequence of clone HFK- 66 

encoding human NR6 24 

2 5 Amino acid sequence of SEQ ID NO: 24 2 5 

Oligonucleotide sequences UPl and LPl, 

respectively 26-27 

Genomic nucleotide sequence of murine NR6 28 

Amino acid sequence of SEQ ID NO: 28 2 9 

30 Murine NR6 . 1 oligonucleotide primers 30, 31 

Murine XL- 3 signal sequence 32 
Linker sequence for mouse IL-3 signal 

sequence and FLAG epitope 33-3 5 
Genomic nucleotide sequence of murine NR6 

3 5 containing addi tonal 5N sequence 3 8 

Oligonucleotide 2199 and 2200, respectively 36, 37 
N- terminal region of NR6 3 9 
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'The polyadenylation signal AATAAATAAA is at nucleotide 
position 1451 to 1460; NR6 . 1 (SEQ ID NO: 12) and NR6 . 2 
(SEQ ID NO: 14) are identical to nucleotide 1223 encoding 
Q407, the represents the end of an exon. NR6 . 1 splices 
5 out an exon present only in NR6 . 2 and uses a different 
reading frame for the final exon which is shared with 
NRG . 2 ; this corresponds to amino acids VLPAKL at amino 
acid residue positions 408-413. The region of 3N- 
untranslated DNA shared by NR6,1, NR6 . 2 and NR6 . 3 is 
10 from nucleotide 1240 to 1475. The WSXWS motif is at 
amino acid residues 330 to 334. 

^The polyadenylation signal AATAAA is at nucleotide 
positions 1494 to 1503. The WSXWS motif is at amino 

15 acid residues 330 to 334. NR6 . 1 and NR6 . 2 are identical 
to nucleotide 1223 encoding Q407 which represents the 
end of an exon. NRG . 2 splices in an exon beginning at 
amino acid residue D408, nucleotide 1224 and ends at 
residue G422, nucleotide 1264. The region of 3N 

20 untranslated DNA shared by NRG . 1 , NR6 . 2 and NRG . 3 is 
from nucleotide position 1283 to 1517. 

^The nucleotide and amino acid numbering corresponds to 
SEQ ID NO: 12 and 14. The WSXWS motif is at amino acid 

25 residues 330 to 334. The polyadenylation signal 

AATAAATAAA is from nucleotide 1781 to 1780. NRG . 1 , 
NRG. 2 and NRG . 3 are identical to nucleotide 1223 
encoding Q407, this represents the end of an exon. 
NRG - 3 fails to splice from this position and, therefore, 

30 translation continues through the intron, giving rise to 
the C-terminal protein region from amino acid residues 
408 to 461. The region of 3N untranslated DNA shared by 
NRG.l, NRG. 2 and NRG . 3 is from nucleotide 146 9 to 1804. 

35 ^The nucleotide sequence is identical to NRG . 1 , NRG . 2 

and NRG. 3 from nucleotide C151, the first nucleotide for 
Pro51. The numbering from this nucleotide is the same 
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as for SEQ ID NO: 14 and 16. The 5N of this point is 
unique to the products generated by 5N RACE not being 
found in NR6.1, NR6 . 2 and NR6 . 3 and is represented in 
SEQ ID NOs:20 and 21. 

5 

^Structure of the murine genomic NR6 locus . The coding 
exons of NR6 span approximately llkb of the mouse 
genome . There are 9 coding exons separated by 8 
introns : 

10 

exon 1 at least 2 3 9nt 
exon 2 2 82nt 
exon 3 13 0nt 
exon 4 17 0nt 
15 exon 5 15 8nt 
exon 6 169nt 
exon 7 188nt 
exon 8 4 3nt 
exon. 9 2 52nt 

20 

Exon 1 encodes the signal sequence, exon 2 the Ig-like 
domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
and 9 are alternatively spliced. 

2 5 The NRG molecules of the present invention have a range of 

utilities referred to in the subject specification. 
Additional utilities include: 

1. Identification of molecules that interact with NR6 . 

3 0 These may include : 

a) a corresponding ligand using standard orphan receptor 
techniques (26) , 

3 5 b) monoclonal antibodies that act either as receptors 
antagonists or agonists, 
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c) mimetic or antagonistic peptides isolated using phage 
display technology (27,28), 

d) small molecule natural products that act either as 
5 antagonists or agonists. 

2. Development of diagnostics to detect 
deletions/rearrangements in the NR6 gene. 

The NR6 knock-out mice studies described herein provide a 
10 useful model for this utility. There are also applications 
in the field of reproduction. For example, people can be 
tested for their NR6 status. NR6 +/- carriers might be 
expected to give rise to offspring with developmental 
problems . 
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10 



M116 : 


5 » 


M108 : 


5 ' 


M159 : 


5 » 


M242 : 


5 ' 


M112 : 


5 ' 


WSDWS 


5 ' 


WSEWS 


5 ' 


1944 


5 ' 


2106 


5 ' 


2120 


5 ' 



ACTCGCTCCAGATTCCCGCCTTTT 3 
TCCCGCCTTTTTCGACCCATAGAT 3 
GGTACTTGGCTTGGAAGAGGAAAT 3 
CGGCTCACGTGCACGTCGGGTGGG 3 
AGCTGCTGTTAAAGGGCTTCTC 3 ' 
(A/G)CTCCA(A/G)TC(A/G)CTCCA 3 
(A/G)CTCCA(C/T)TC(A/G)CTCCA 3 



[SEQ ID NO: 2] 
[SEQ ID NO: 3] 
[SEQ ID NO: 4] 
[SEQ ID NO: 5] 
[SEQ ID NO: 6] 

[SEQ ID NO: 7] 
[SEQ ID NO: 8] 



AAGTGTGACCATCATGTGGAC 3' [SEQ ID NO: 9] 
GGAGGTGTTAAGGAGGCG 3' [SEQ ID NO: 10] 
ATGCCCGCGGGTCGCCCG 3' [SEQ ID NO: 11] 



15 EXAMPLE 2 

Isolation of initial NR6 cDNA clones using 
oligonucleotides designed against the conserved WSXWS 
motif found in members of the haemopoietin receptor 
family 

20 

(i) A commercial adult mouse testis cDNA library cloned 
into the UNI-ZAP bacteriophage (Stratagene, CA, USA; 
Catalogue numbers 937 3 08) was used to infect 
Escherichia coli of the strain LE392. Infected bacteria 

25 were grown on twenty 150 mm agar plates, to give 

approximately 50/000 plaques per plate. Plaques were 
then transferred to duplicate 150 mm diameter nylon 
membranes (Colony/ Plaque Screen, NEN Research Products, 
MA, USA) , bacteria were lysed and the DNA was denatured 

30 and fixed by autoclaving at lOO^c for 1 min with dry 
exhaust. The filters were rinsed twice in 0.1% (w/v) 
sodium dodecyl sulfate (SDS) , 0.1 x SSC (SSC is 150 mM 
sodium chloride, 15 mM sodium citrate dihydrate) at room 
temperature and pre-hybridized overnight at 42^0 in 6 x 

3 5 SSC containing 2 mg/ml bovine serum albumin, 2 mg/ml 
Ficoll, 2 mg/ml polyvinylpyrrolidone, 100 mM ATP, 10 
mg/ml tRNA, 2 mM sodium pyrophosphate, 2 mg/ml salmon 
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sperm DNA, 0.1% (w/v) SDS and 200 mg/ml sodium azide. 
The pre-hybridisation buffer was removed. 1.2 Fg of the 
degenerate oligonucleotides for hybridization (WSDWS; 
Example 1) were phosphorylated with T4 polynucleotide 
5 kinase using 960 mCi of y32p-ATP (Bresatec, 

Australia) . Unincorporated ATP was separated from the 
labelled oligonucleotide using a pre-packed gel 
filtration column (NAP-5; Pharmacia, Uppsala, Sweden). 
Filters were hybridized overnight at 42^C in 8 0 ml of 
10 the prehybridisation buffer containing 0.1% (w/v) SDS, 
rather than NP40, and 10^ - 10^ cpm/ml of labelled 
oligonucleotide. Filters were briefly rinsed twice at 
room temperature in 6 x SSC, 0.1% (v/v) SDS, twice for 3 0 
min at 4 5^0 in a shaking waterbath containing 1.5 1 of 
15 the same buffer and then briefly in 6 x SSC at room 

temperature. Filters were then blotted dry and exposed 
to autoradiographic film at -70<^C using intensifying 
screens, for 7-14 days prior to development. 
Plaques that appeared positive on orientated duplicate 
20 filters were picked, eluted in 1 ml of 100 mM NaCl , 10 
mM MgCl2, 10 mM Tris.HCl pH7 . 4 containing 0.5% (w/v) 
gelatin and 0.5% (v/v) chloroform and stored at 4^0. 
After 2 days LE3 92 cells were infected with the eluate 
from the primary plugs and replated for the secondary 
25 screen. This process was repeated until hybridizing 
plaques were pure . 

Once purified, positive cDNAs were excised from the ZAP 
II bacteriophage according to the manufacturer's 

3 0 instructions (Stratagene, CA, USA) and cloned into the 

plasmid pBluescript . A CsCl purified preparation of the 
DNA was made and this was sequenced on both strands. 
Sequencing was performed using an Applied Biosystems 
automated DNA sequencer, with fluorescent 

35 dideoxynucleotide analogues according to the 

manufacturer's instructions. The DNA sequence was 
analysed using software supplied by Applied Biosystems. 
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Two clones isolated from the mouse testis cDNA library 
shared large regions of nucleotide sequence identity 68- 
1 and 68-2 and appeared to encode a novel member of the 
haemopoietin receptor family and the inventors gave the 
5 putative receptor the working name "NR6". 

(ii) In a parallel series of experiments, a commercial 
mouse brain cDNA library (STRATAGENE #967319, Balb/c 
day-20, whole brain cDNA/Uni-ZAP XR Vector) was used to 

10 infect E.coli strain XLl-Blue MRF= . Infected bacteria 
were grown on 90xl35mm square agar plates to give about 
25,000 plaques per plate. Plaques were then transferred 
to positively charged nylon membranes, Hybond-N(-f) 
(Amersham RPN 203B) , bacteria were lysed and the DNA was 

15 denatured with denaturing 0.5 M NaOH, 1.5 M NaCl at room 
temperature for 7 min. The membranes were neutralized 
with 0.5 M Tris-HCL pH7 . 2 , 1.5 M NaCl , 1 mM EDTA at room 
temperature for 10 min before the DNA fixation by UV 
crosslinking . 

20 

A mixture of WSDWS and WSEWS oligonucleotide probes (SEQ 
ID NOs: 7 and 8) were labelled with a ["-"^^Pl-ATP 
(TOYOBO #PNK-104 Kination kit) , The membranes from the 
mouse brain cDNA library were then hybridized with the 

2 5 mixture of WSDWS and WSEWS oligonucleotide probes in the 
Rapid Hybridization Buffer (Amersham, RPN1636) at 42^C 
for 16 hours. Filters were washed with lxSSC/0.1% (w/v) 
SDS at 420C before autoradiography. Plaques that 
appeared positive on orientated duplicate filters were 

30 picked and replated on E. coli, XLl-Blue MRFN with the 
process of immobilisation on nylon membranes, 
hybridization of membranes with oligonucleotide probes, 
washing and autoradiography repeated until pure plaques 
had been obtained. 

35 

The cDNA fragment from pure positively hybridizing 
plaques was isolated by excision with the helper phage 
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Strain ExAssist according to the manuf acturer=s 
instructions (Stratagene, #967319) . Sequencing was 
performed after the amplification with Ampli-Taq DNA 
polymerase and Taq dideoxy terminator cycle sequencing 
5 kit (Perkin Elmer, #401150) by 25 cycles of 96^0 for 10 
sec, 50*^C for 5 sec, SO^C for 4 min followed by GO^C for 
5 min with the sequencing primers on an ABI model 3 77 
DNA sequencer. 

10 One clone, MBC-8, from the mouse brain library shared 

large regions of nucleotide sequence identity with both 
the 68-1 and 68-2 clones isolated from the mouse testis 
cDNA library. 

15 (iii) In a third series of experiments, total RNA was 

prepared from the mouse osteoblastic cell line, KUSA, 
according to the method of Chirgwin et al . (15), and 
poly(A)+RNA was further purified by oligo (dT) -cellulose 
chromatography (Pharmacia Biotech) . Complementary DNA 

20 was synthesized by oligo (dT) priming, inserted into the 
UniZAP XR directional cloning vector (Stratagene) , and 
packaged into 8 phage using Gigapack Gold (Stratagene) , 
yielding 1.25 x 10 independent clones. 

25 Approximately 10^ clones were screened essentially as 

described in (ii) above. Briefly, probes were labeled 

with "^^P using T4 polynucleotide kinase and 

prehybridization was performed for 4 hr in the Rapid 

hybridization buffer (Amersham LIFE SCIENCE) at ^2^C . 

30 Filters (Hybond N+ , Amersham) were then hybridized for 

3 2 

19 hr under the same condition with the addition of P- 
labeled WSXWS mix oligonucleotides and washed 3 times. 
The final wash was for 30 min in 1 x SSPE, 0.1% (w/v) 
SDS at 42^C. Filters were then exposed with an 
35 intensifying screen to Kodak X-OMAT.AR film for 5 days. 

Isolated clones were subjected to the in v^ivo excision 
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of pBluescript SK(-) phagemid (Stratagene) , and plasmid 
DNA was prepared by the standard method. DNA sequences 
were determined using an ABI PRISM 3 77 DNA Sequencer 
(Perkin Elmer) with appropriate synthetic 
5 oligonucleotide primers, A clone pKUSA166 shared large 
regions of nucleotide sequence identity with the MBC-8, 
68-1 and 68-2 clones isolated from the mouse brain and 
testis cDNA libraries. 



1 0 EXAMPLE 3 

Isolation of further NR6 cDNA clones using probes 
specific for NR6 



(i) In order to identify other cDNA libraries 

15 containing cDNA clones for NR6 , the inventors performed 
PGR upon 1 fil aliquots of X-bacteriophage cDNA libraries 
made from mRNA from various human tissues and using 
oligonucleotides 2070 and 2057, designed from the 
sequence of 68-1 and 68-2, as primers. Reactions 

20 contained 5 ^1 of 10 x concentrated PGR buffer 

(Boehringer Mannheim GmbH, Mannheim, Germany) , 1 ^1 of 
10 mM dATP, dGTP, dGTP and dTTP, 2.5 fil of the 
oligonucleotides HYB2 and either T3 or T7 at a 
concentration of 100 mg/ml , 0.5 fxl of Taq polymerase 

2 5 (Boehringer Mannheim GmbH) and water to a final volume 

.of 50 /xl . PGR was carried out in a Perkin-Elmer 9600 by 
heating the reactions to 9G^C for 2 min and then for 25 
cycles at 9S^C for 30 sec, SS^C for 30 sec and 72^C for 
2 min. PGR products were resolved on an agarose gel, 

30 immobilized on a nylon membrane and hybridized with 32p- 
labelled oligonucleotide 1943 (SEQ ID N0:42) . 



In addition to the original library, a mouse brain cDNA 
library appeared to contain NR6 cDNAs . These were 
35 screened using a 22p. labelled oligonucleotides 1944, 
2106, 2120 (Example 1) or with a fragment of the 
original NR6 cDNA clone from 68-1 (nucleotide 934 to the 
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end of NR6,1 in Figure 1) labelled with ^^P using a 
random decanucleot ide labelling kit (Bresatec) . 
Conditions used were similar to those described in (i) 
above except that for the labelled oligonucleotides, 
5 filters were washed at 55^C rather than 45^0, while for 
the NRG cDNA fragment prehybridization and hybridization 
was carried out in 2xSSC and filters were washed at 0.2 
X SSC at es^C, Again, as described in (i) above, 
positively hybridising plaques were purified, the cDNAs 
10 were recovered and cloned into plasmids pBluescript II 
or pUC19. Independent cDNA clones were sequenced on 
both strands. 

Using this procedure, 6 further clones, 68-5, 68-35, 68- 
15 41, 68-51, 68-77 and 73-23, contained large regions of 
sequence identity with 68-1, 68-2, MBC-8 and pKUSAl 66. 

In a parallel series of experiments, further screening 
was performed with hybridization probes prepared from 

20 the 1-7 kbp EcoRI-XhoI fragment excised from pKUSA166. 

3 2 

This fragment was excised and labeled with P by using 
T7QuickPrime Kit (Pharmacia Biotech) . Approximately 
6x10^ clones were screened. Hybond N+ filters 
(Amersham) were first prehybridized for 4hr at 42^0 in 

25 50% (v/v) formamide, 5xSSPE, 5xDenhardt * s solution, 0.1% 
(w/v) SDS, and 0 . Img/ml denatured salmon sperm DNA. 
Hybridization was for 16 hours under the same conditions 
with the addition of ^^P- labelled NR6- cDNA fragment 
probes. Finally the filters were washed once for Ihr in 

30 0.2xSSC, 0.1% (w/v) SDS at 68°C. Eight clones were 

isolated, and phage clones were subjected to the in vivo 
excision of the pBluescript SK(-) phagemid (Stratagene) . 
The plasmid DNAs were prepared by the standard method. 
DNA sequences were determined by an ABI PRISM 3 77 DNA 

3 5 Sequencer using appropriate synthetic oligonucleotide 
primers . 
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Using this procedure 8 further clones from the KUSA 
library contained large regions of sequence identity 
with 68-1, 68-2, MBC-8, pKUSA166, 68-5, 68-35, 68-41, 
68-51, 68-77 and 73-23 were isolated. 

5 

EXAMPLE 4 
Isolation of genomic DNA encoding NR6 

DNA encoding the murine NR6 genomic locus was also 

10 isolated using the 68-1 cDNA as a probe. Two positive 
clones, 2-2 and 57-3, were isolated from a mouse 129/Sv 
strain genomic DNA library cloned into A FIX. These 
clones were overlapping and the position of the 
restriction sites, introns and exons were determined in 

15 the conventional manner. The region of the genomic 

clones containing exons and the intervening introns were 
sequenced on both strands using an Applied Biosystems 
automated DNA sequencer, with fluorescent 
dideoxynucleotide analogues according to the 

20 manufacturer's instructions. Figure 2 shows the 
nucleotide sequence and corresponding amino acid 
sequence of the translation regions. This is also shown 
in SEQ ID NOs:30 and 31. Figure 3 provides the genomic 
NR6 gene sequence but with additional 5N sequence. This. 

25 is also represented in SEQ ID NO: 38 in relation to this 
sequence. The coding exons of NR6 span approximately 
llkb of the mouse genome. There are 9 coding exons 
separated by 8 introns : 



30 



35 



exonl 


at least 239nt 


intronl 


5195nt 


exon2 


282nt 


intron2 


214nt 


exon3 


130nt 


intron3 


107nt 


exon4 


170nt 


intron4 


1372nt 


exonS 


158nt 


intronS 


68nt 


exon6 


169nt 


introne 


2020nt 


exon7 


188nt 


intron7 


104nt 


exon8 


43nt 


intron8 


181nt 
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exon9 252nt 

Exon 1 encodes the signal sequence, exon 2 the Ig-like 
domain, exons 3 to 6 the hemopoietin domain. Exons 7, 8 
5 and 9 are alternatively spliced. 



EXAMPLE 5 

10 5N RACE analysis of NR6 

5 ' -RACE was used to investigate the nature of the 
sequence 5' of nucleotide 960, encoding lle321 of NR6 , 1 , 
2 and 3. The nucleotide and corresponding amino acid 

15 sequences are shown in SEQ ID NOs : 12 , 14 and 16, 

respectively. 5 '-RACE was performed using Advantage 
KlenTaq polymerase (clontech, cat no. K1905-1) on mouse 
brain Marathon- ready cDNA (clontech, cat no. 74 50-1) 
according to the manufacturer's instructions. Briefly, 

20 the first rounds of amplification were performed using 
Sfil of cDNA in a total volume of 50/xl, with ImM each of 
the primers AP1&M116 [SEQ ID NO: 2] or APlScM159 [SEQ ID 
NO: 4] by 35 cycles of 94^0 x O.Smin, 68^C x 2 . Omin on 
GeneAmp 2400 ( Perkin-Elmer ) . An amount of 5/il of 50- 

25 fold diluted product from the first amplification was 
then re-amplified ; for the products generated with 
primers API and M116 [SEQ ID NO: 2] in the first 
amplification, 1 mM of the primers AP2&M108 [SEQ ID 
NO: 3] were used in the second amplification. For the 

3 0 products generated with primers API and M116 [SEQ ID 

NO: 2] in the first amplification, two separate secondary 
reactions were performed, one reaction with 1 mM primers 
AP2&M242 [SEQ ID NO : 5 ] and the other with 1 mM primers 
AP2ScM112 [SEQ ID NO : 6 ] . Amplification was achieved 

35 using 25 cycles of 94^0 x 0,5min, 68^0 x 2.0min. These 
samples were analyzed by agarose gel electrophoresis. 
When a single ethidium bromide staining amplification 
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product was observed, it was purified by QIAquick PGR 
purification kit according to the manufacturer's 
instructions (qiagen, cat no. DG-0281) and its sequence was 
directly determined using both primers used in the 
5 secondary amplification step, that is AP2 and either 

M108 [SEQ ID NO:3], M242 [SEQ ID N0:5] or M112 [SEQ ID 
NO: 6] . 

EXAMPLE 6 

10 Cloning of NR6 

From the initial screens of mouse brain and testis cDNA 
libraries with the degenerate WSXWS oligonucleotides and 
subsequent screening of cDNA libraries from mouse 
15 testis, mouse brain and the KUSA osteoblastic cells line 
a total of 18 NR6 cDNAs have been isolated. Nucleotide 
sequence of NR6 was also determined from 5 ' RACE analysis 
of brain cDNA. Additionally, two murine genomic DNA 
clones encoding NR6 have also been isolated. 

20 

Comparison of the NR6 cDNA clones revealed a common 
region of nucleotide sequence which included a 123 base 
pairs 5 ' -untranslated region and 1221 base pairs open 
reading frame, stretching from the putative initiation 

25 methionine, Metl to Gln407 (SEQ ID N0s:12, 14 and 16, 

respectively) . Within this common open reading frame, a 
haemopoietin receptor domain was observed which 
contained the four conserved cysteine residues and the 
five amino acid motif WSXWS typical of members of the 

30 haemopoietin receptor family, was observed. 

Further analyses revealed that after nucleotide 1221, 
three different classes of NRG cDNAs could be found, 
these were termed NR6.1, NR6 . 2 and NR6 . 3 (SEQ ID NOs:12, 
35 14 and 16, respectively) . Each encoded a receptor that 
appeared to lack a classical transmembrane domain and, 
would, therefore be likely to be secreted into the 
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extracellular environment. Although the putative C- 
terminal region of the three classes of NRG proteins 
appear to be different, the cDNAs encoding them also had 
a common region of 3 ' -untranslated region. 

5 

With regard to SEQ ID NOs:12, 14 and 16, the number of 
both nucleotides and amino acids begins at the putative 
initiation methione . NR6 . 1 and NR6 . 2 are identical to 
nucleotide 1223 encoding Q407, this represents the end 

10 of an exon. NR6 , 1 splices out an exon present only in 
NR6.2 and uses a different reading frame for the final 
exon which is shared with NR6 . 2 . The 3N-untranslated 
region is shared by NR6 . 1 , NR6 . 2 and NR6 . 3 , NR6 . 2 
splices in an exon starting with nucleotide 1224 

15 encoding D408 and ending with nucleotide 1264 encoding 
the first nucleotide in the codon for G422 and uses a 
different reading frame for the final exon which is 
shared with NR6 . 2 (see Figure 1) , NR6 . 3 fails to splice 
from position nucleotide 1224, therefore, translation 

20 continues through the intron, giving rise to the C- 
terminal protein region. 

The sequence of NRG cDNA products generated by 5 '-RACE 
amplification from mouse brain cDNA preparation is 
shown in SEQ ID NO: 18. The nucleotide sequence 
identified using 5 * -RACE appeared to be identical to the 
sequence of cDNAs encoding NR6 . 1 , 1SIR6 . 2 , and NRG . 3 from 
nucleotide C151, the first nucleotide for the codon for 
Pro51. 5' of this nucleotide, the sequences diverged 
and the sequence is unique not being found in NRG . 1 , 
NRG. 2 or NRG . 3 . Additionally, there is a single 
nucleotide difference, with the sequence from the RACE 
containing an G rather than an A at nucleotide 475, 
resulting in Thrl59 becoming Ala. 

Analysis of the genomic clones, revealed that they were 
overlapping and contained exons encoding the majority of 
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the coding region of the three forms of NR6 (Figures 1, 
2 and 3) . These genomic clones, contained exons 
encoding from Asp50 (nucleotide 148) of the NR6 cDNAs . 
Sequence 5» of this in the cDNAs, including the 5*- 
5 untranslated region and the region encoding Metl to 
Gln49 (SEQ ID NOs : 12 , 14 and 16), and the 5' end 
predicted from analysis of 5» RACE products (SEQ ID 
NO: 18) were not present in the two genomic clones 
isolated. 

10 

Analysis of the NR6 genomic DNA clones also provided an 
explanation of the three classes of NR6 cDNAs found. It 
is likely that NR6 . 1 , NRe , 2 and NR6 . 3 arise through 
alternative splicing of NRG mRNA (Figure 1) . The last 

15 amino acid residue that these different NR6 proteins are 
predicted to share is Gln407. SEQ ID NO: 18 shows that 
Gln407 is the last amino acid encoded by the exon that 
covers nucleotides g5850 to g6037 (see Figure 2) . 
Alternative splicing from the end of this exon (Figure 

20 1) accounts for the generation of cDNAs encoding NR6 . 1 
(SEQ ID NO:12), NR6 . 2 (SEQ ID N0:14) and NR6 . 3 (SEQ ID 
NO:16). In the case of NR6.1, the region from g6038 to 
g6425 is spliced out, leading to juxtaposition of g6037 
and g6426. In the case of NR6 . 2 , the region from g6038 

25 to 6141 is spliced out, an exon from 6142 to g6183 is 
retained and then this is followed by splicing out of 
the region from g6183 to g6425. NRG . 3 appears to arise 
when there is no splicing from nucleotide g6038. For 
all three forms, a secreted rather then transmembrane 

30 form is generated, these differ however in their 

predicted C- terminal region. The genomic NRG sequence 
with additional 5N sequence is shown in Figure 3. 

EXAMPLE 7 

3 5 ESTs 

Databases were searched with the murine NRG 
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corresponding to the unspliced version shown in SEQ ID 
NO: 16. The murine NR6 sequence used is shown in SEQ ID 
NO: 22. 

The databases searched were: 

5 

(i) dbEST - Database of Expressed Sequence Tags 
National Center for Biotechnology Information National 
Library of Medicine, 38A, 8N8058600 Rockville Pike, 
Bethesda, MD 20894 Phone: 0011-1-301-496-2475 Fax: 

10 0015-1-301-480-9241 USA. 

(ii) DNA Data Bank of Japan DNA Database Release 3689. 
Prepared by: Sanzo Miyazawa Manager/Database 
Administrator HidenoriHayashida Scientific Reviewer 

15 Yukiko Yamazaki/Eriko Hatada/Hiroaki Serizawa 

Annotators/ reviewers Motono Horie/Shigeko Suzuki/ Yumiko 
SataoSecretaries/typists DNA Data Bank of JapanNat ional 
Institute of Genetics Center for Genetic Information 
research Laboratory of Genetic Information Analyses 1111 

20 YataMishima, Shizuoka 411 Japan. 

(iii) EMBL Nucleic Acid Sequence Data Bank Release 
47,0. 

2 5 (iv) EMBL Nucleic Acid Sequence Data Bank Weekly Updates 

Since Release 44. 

(v) Genetic Sequence Data Bank NCBI-GenBank Release 94 
National Center for Biotechnology Information National 

30 Library of Medicine, 38A, 8N805 8600 Rockville Pike, 

Bethesda, MD 20894 Phone: 0011-1-301-495-2475 Fax: 
0015-1-301-480-9241 USA. 

(vi) Cumulative Updates since NCBI-GenBank Release 88 

3 5 National Center for Biotechnology Information National 
Library of Medicine, 38A, 8N805 8600 Rockville Pike, 
Bethesda, MD 208 94 USA. 
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The search of the databases with the murine probe 
identified several EST's having sequence similarity to 
the probe . The EST ' s were : 

5 W66776 (murine sequence) 
MM583 9 (murine sequence) 
AA014965 (murine sequence) 
W46604 (human sequence) 
W4 6603 (human sequence) 
10 H14 00 9 (human sequence) 
N7 8 87 3 (human sequence) 
R87407 (human sequence) . 

EXAMPLE 8 

15 Isolation of 3N cDNA clones encoding human NR6 

PGR products encoding human NRG were generated using 
oligonucleotides UPl and LPl (see below) based on human 
ESTs (Genbank Acc :H1400 9, Genbank Acc:AA042 914) that 

20 were identified from databases searched with murine NR6 
sequence (SEQ ID NO:22) . PGR was performed on a human 
fetal liver cDNA library (Marathon ready cDNA CLONTECH 
#74 03-1) using Advantage Klen Taq Polymerase mix 
(CLONTECH #8417-1) in the buffer supplied at 94 IG fro 

25 30s and 681C for 3 min for 35 cycles followed by 681C 
for 4 min and then stopping at 151C, A standard PGR 
programme for the Perkin-Elmer GeneAmp PGT system 2400 
thermal cycle was used. The PGR yielded a prominent 
product of approximately 560 base pairs (bp; SEQ ID 

30 NO:18), which was radiolabelled with ["-^^P] dCTP using a 
random priming method (Amersham, RPN, 1607, Mega prime 
kit) and used to screen a human fetal kidney 5N- STRETCH 
PLUS cDNA library (CLONTECH #HL1150x) . Library screens 
were performed using Rapid Hybridisation Buffer 

35 (Amersham, RPn 1636) according to manufacturer's 

instructions and membranes washed at 651C for 30 min in 
O.lxSSC/0.1% (w/v) SDS . Two independent cDNA clones 
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were obtained as lanibda phage and subsequently subcloned 
and sequenced. Both clones (HFK-63 and HFK-66) 
contained 1.4 kilobase (kb) inserts that showed sequence 
similarity with murine NRG. The sequence and 
5 corresponding amino acid translation of HFK-66 is shown 
in SEQ ID NO: 24. 

The translation protein sequences of clone HFK-66 shows 
a high degree of sequence similarity with the mouse NR6 . 

OL I GONUCIiEOT I DES 

UPl: 5NTCC AGG CAG CGG TCG GGG GAC AAC 3N [SEQ ID NO: 26] 
LPl: 5N TTG CTC ACA TCG TCC ACC ACC TTC 3N [SEQ ID 
NO: 27] 

EXAMPLE 9 
Genomic Structure of Human NR6 

Human genomic DNA clones encoding human NR6 was 

2 0 isoloated by screening a human genomic library (Lambda 

FIXJII Stratagene 946203) with radiolabelled 
oligonucleotides, 2199 and 2200 (see below) . These 
oligonucleotides were designed based on human ESTs 
(Genbank Acc:R874 07, Genbank Acc:H14 009) that were 

25 identified from databases searched with murine NR6 . 
Filters were hybridised overnight at 3 71C in 6xSSC 
containing 2 mg/ml bovine serum albumin, 2 mg/ml Ficoll, 
2mg/ml polyvinylpyrrolidone, 10 0 mM ATP, 10 mg/ml tRNA, 
2 mM sodium pyrophosphate, 2 mg/ml salmon sperm DNA, 

30 0.1% (w/v) SDS and 200 mg/ml sodium azide and washed at 
6 51C in 6 x SSC/0.1% SDS. Five independent genomic 
clones were obtained and sequenced. The extend of 
sequence obtained has determined that the clones overlap 
and exhibit a similar genomic structure to murine NR6 . 

3 5 Exon coding regions are almost identical over the region 

covered by the genomic clones while intron coding 
regions differ, although the size of the introns are 
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comparable. The extent of known overlap is shown in 
Fig. 5. 

OLIGONUCLEOTIDES : 

5 

2199: 5N CCC ACG CTT CTC ATC GGA TTC TCC CTG 3N [SEQ ID 
NO I 3 6] 

2200: 5N CAG TCC ACA CTG TCC TCC ACT CGG TAG 3N [SEQ ID 
NO : 3 7 ] 

10 

EXAMPLE 10 

Northern Blot Analysis of Human NR6 itiRNA Expression 

15 Clontech Multiple Tissue Northern Blots (Human MTN Blot, 
CLONTECH #7760-1, Human MTN Blot IV, CLONTECH #7766-1, 
Human Brain MTN Blot II, CLONTECH #7755-1, Human Brain 
MTN Blot III, CLONTECH #7750) were probed with a 
radiolabelled 3N human NR6 cDNA clone, HFK-6 6 (SEQ ID 

20 NO:24). The clone was labelled with ["-^^P] dCTP using a 
random priming method (Amersham, RPN 1607, Mega prime 
kit) . Hybridisation was performed in Express 
Hybridisation Solution (CLONTECH H50910) for 3 hours at 
671C and membranes were washed in O.lxSSC/0.1% w/v SDS 

25 at 501C. 

A 1.8 kb transcript was detected in a variety of human 
tissues encompassing reproductive, digestive and neural 
tissues. High levels were observed in the heart, 

3 0 placenta, skeletal muscle, prostate and various areas of 
the brain, lower levels were observed in the testis, 
uterus, small intestine and colon. Photographs showing 
these Northern blots are available upon request. This 
expression pattern differs from the expression pattern 

3 5 observed with murine NR6 . 

EXAMPLE 11 
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Mouse NR6 Expression Vectors 



pEF-FLAG/mNRS . 1 

5 The mature coding region of mouse NRG . 1 was amplified 

using the PGR to introduce an in- frame Asc I restriction 
enzyme site at the 5" end of the mature coding region 
and an Mlu I site at the 3' end, using the following 
oligonucleotides : - 

10 

5N QliQO 5N-AGCTGGCGCGCCTCCCGGGCGGATCGGGAGCCCAC- 3N [SEQ 
ID NO: 3 0] 

3N oliao 5N-AGCTACGCGTTTAGAGTTTAGCCGGCAG-3N[SEQ ID 
NO : 3 1 ] 

15 

The resulting PCR derived DNA fragment was then digested 
with Asc I and Mlu I and cloned into the Mlu I site of 
pEF-FLAG. Expression of NR6 is under the control of the 
polypeptide chain elongation factor lot promoter as 
20 described (16) and results in the secretion, using the 
IL3 signal sequence from pEF-FLAG, of N-terminal FIAG- 
tagged NR6 protein. 

pEF-FLAG was generated by modifying the expression 
25 vector pEF-BOS as f ollows : - 

pEF-BOS (16) was digested with Xba I and a linker was 
synthesized that encoded the mouse IL3 signal sequence 
(MVLASSTTSIHTMLLLL.LMLFHLGLQASIS) and the FLAG epitope 
3 0 (DYKDDDDK) . Asc I and Mlu I restriction enzyme sites 

vy^ere also introduced as cloning sites. The sequence of 
the linker is as f ollows 

MVLASSTTS IHT 

35 M 

CTAGACTAGTGCTGACACAATGGTTCTTGCCAGCTCTACCACCAGCATCCACACCA 
TG 
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TGATCACGACTGTGTTACCAAGAACGGTCGAGATGGTGGTCGTAGGTGTGGTAC 
LLLLLMLFHLGLQASI S Asc 

5 I 

CTGCTCCTGCTCCTGATGCTCTTCCACCTGGGACTCCAAGCTTCAATCTCGGCGCG 
CC 

GACGAGGACGAGGACTAGCAGAAGGTGGACCCTGAGGTTCGAAGTTAGAGCCGCGC 
GG 

10 

DYKDDDDK Mlu I 
AGGACTACAAGGACGACGATGACAAGACGCGTGCTAGCACTAGT 

TCCTGATGTTCCTGCTGCTACTGTTCTGCGCACGATCGTGATCAGATC 

15 

The two oligonucleotides were annealed together and 
ligated into the Xba I site of pEF-BOS to give pEF-FIAG. 

pCOSl/FLAG/mNRS & pCH01/FLAG/mNR6 

20 

A DNA fragment containing the sequences encoding IL3 
signal sequence/Flag/mNR6 and the poly (A) adenylation 
signal from human G-CSF cDNA, was excised from pEF- 
FLiAG/mNR6 using the restriction enzyme EcoR I. This DNA 
25 fragment was then inserted into the EcoR I cloning site 
of pCOSl and pCHOl 

The pCOSI and pCHOl vectors were constructed as follows. 
pCHOl is also described in reference (17) but with a 
30 different selectable marker. 

pCOSl was prepared by digesting HEF-12h-g"l (see Figure 
24 of International Patent Publication No. WO 92/19759) 
with EcoRl and Smal and ligating the digesting product 
3 5 iwht an EcoRI -NotI -BamHI adaptor (Takara 4 510) . The 

resulting plasmid comprises an EFI" promoter/enhancer, 
Nco^ marker gene, SV4 0E, ori and an Amp'' marker gene. 
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pCHOl was constructed by digesting DHFR-PMh-grl (see 
Figure 25 of International Patent Publication No. WO 
92/19759) with Pvul and JE:co47III and ligating same with 
pCOSI digested with Pvul and Eco47III. The resulting 
5 vector, pCHOl, comprises an EFI" promoter/ enhancer , an 
DHFR marker gene, SV40E, Ori and a Amp"" gene. 

EXAMPLE 12 

10 

mRN6 has been expressed as an NN Flag tagged protein 
following transfection of CHO cells and as a CN Flag 
tagged protein following transfection of KUSA cells in 
both cases varying levels of dimeric and aggregated NR6 
15 were secreted. 

EXAMPLE 13 
Murine NR6 expression 

20 

NR6 expression studies were conducted in murine Northern 
Blots. At the level of sensitivity used in the adult 
mouse, NRG expression was detected in salivary gland, 
lung and testis. During embryonic development, NR6 is 

25 expressed in fetal tissues from day 10 of gestation 
through to birth. In cell lines, NR6 expression has 
been observed in the T- lymphoid line CTLL-2 as well as 
in FD-PyMT (FDC-Pl myeloid cells expressing polyoma 
midle T gene) , and f ibroblastoid cells including bone 

30 marrow and fetal liver stromal lines. 

EXAMPLE 14 

Expression, purification and characterisation of CHO and 
KUSA mNR€ 

35 

The methods provide for the production of a dimeric form 
of CHO derived NN FLAG-mNR6 without refolding. All 
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Other methods are capable of producing NR6 and are 
encompassed by the present invention. 

A. Production of CHO derived N' FLAG-mNRG (dimeric 
5 form) 

(i) Protein Production 

To analyse structure and functional activity, a cDNA 
fragment containing the entire coding sequence of murine 

10 NR6 with an N-terminal FLAG (NN FLAG) sequence was 
cloned into the EcoRl site of the expression vector 
pCHOl . For stable production of N-terminal FLAG-tagged 
IsrR6 the vector contains the DHFR (dihydrof olate 
reductase) gene as a selective marker with the NR6 gene 

15 under the control of an EFla promoter. CHO cells were 
transfected with the construct using a polycationic 
liposome transfection reagent (Lipof ectamine , GibcoBRL) . 

(ii) . Lipof ectamine transfection method 

20 

Using six well tissue culture plates either 2 x 10^ KUSA 
cells in 2ml IMDM + 10% (v/v) FCS or 2 x 10^ CHO cells 
were cultured in 2ml "-MEM + 10% (v/v) FCS until 70% , 
confluent. 2Fg DNA diluted in lOOFl OPTI-MEM I (Gibco 

25 BRL, USA) was mixed gently with 12F1 lipof ectamine 
diluted in lOOFl OPTI-MEM I and incubated at room 
temperature for 3 0min to allow DNA complex formation. 
DNA complexes were gently diluted in a total volume of 
1ml of OPTI-MEM I and overlaid onto washed KUSA or CHO 

30 cell monolayers. A further 1ml IMDM +20% (v/v) FCS 
(KUSA cells) or 1ml "-MEM + 20% (v/v) FCS (CHO cells) 
was added to transfected cells after 5 hours. At 24 
hours, the culture medium was replaced with fresh 
complete growth medium. At 48 hours after transfection, 

35 selection was applied. A methotrexate resistant clone 
secreting comparatively high levels of NR6 was selected 
and expanded for further analysis. 
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(iii) Protein expression 

CHO cells were grown to confluence in roller bottles in 
nucleoside free "-MEM + 10% (v/v) FCS . Selection was 
5 maintained by using 100 ng/ml Methotrexate in the 
conditioned media according to manufacturer 
instructions. Expression was monitored by Biosensor and 
harvesting found to be optimal at 3 to 4 days. 

10 B. Protein Analysis 

(i) Biosensor analysis 

Expression and purification was monitored by Biosensor 
15 analysis (BiaCoreTM, Sweden) where anti FLAG peptide M2 
antibody (Kodak Eastman, USA) , specific for the FLAG 
peptide sequence was bound to the sensorchip. Fractions 
were analysed for binding to the sensor surface 
(resonance units) and the sample then removed from the 
20 surface using 50 mM Diethylamine pH 12.0 prior to 
analysis of the next fraction. Immobilisation and 
running conditions of the Biosensor follow the 
manufacturer • s instructions . 

25 (ii) Protein Production 

In order to generate and characterise NR6 , conditioned 
media (2 L) produced by CHO cells was harvested after 
day 3, post confluence. Conditioned media was 
30 concentrated using diaf iltration with a 10,000 molecular 
weight cut-off. (Easy flow, Sartorius, Aus) . At a volume 
of 200 ml (i.e. 10 x concentrated) the sample was buffer 
exchanged into 20 mM Tris, 0.15M NaCl , 0.02% (v/v) Tween 
20 pH 7.5 (Buffer A) . 

35 

(iii) immunoprecipitation and Western Blot analysis 

of mNR6 
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Concentrated conditioned media (1ml) was 
immunoprecipitated with M2 affinity resin (20F1, Kodak 
Eastman). To examine the structural characterisation of 
mNR6 SDS PAGE was performed under reducing and non- 
5 reducing conditions. Separation was performed on NOVEX 
4-20% (v/v) Tris/glycine gradient gels and protein 
transfered on PVDF membrane. Western blots were probed 
with biotinylated M2 antibody (primary, 1:500) and then 
streptavidin peroxidase (secondary, 1:3000). Samples 
10 were visualised by autoradiography using 

electrochemiluminescence (ECL, Dupont , USA) . 

By regressional analysis of prestained standards 
(BIORAD, Aus.) the molecular weight of the monomeric 

15 unit was calculated to be 65,000 daltons. Under non- 
reducing conditions the molecular weight was calculated 
to be 127,000 indicating that NR6 is a disulphide linked 
dimer. A tetrameric complex running at approximately 
250,000 daltons was also observed. Although a band 

20 running at approximately 50,000 daltons was observed, no 
monomeric NRG was detected under non-reducing conditions 
indicating that the majority of NR6 expressed in this 
system is disulphide linked, 

2 5 (iv) Affinity Chromatography of mNR6 

Concentrated conditioned media (200 ml) was applied to 
M2 affinity resin (5ml) under gravity. To enhance 
recovery the unbound fraction was reapplied to the 

3 0 column four times prior to extensive washing of the 

column with 200 volumes of Buffer A. Biosensor analysis 
indicates that approximately 2 0% of the M2 binding 
originally present in the concentrate remains in the 
unbound fraction. The bound fraction was eluted from the 
35 column using an immunodesorbant (50 ml ) ; actisep 
(Sterogene Labs, USA). 
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(v) Ion exchange and Desalting of mNR6 

In order to buffer exchange mNR6 prior to anion 
chromatography, 10 ml batches of the eluted fraction (50 
5 ml)' were applied to an XK column (400 x 26 mm I,D.) 
containing G2 5 sepharose (Pharmacia, Sweden) . 
Chromatography was developed at 4 ml/min using an FPLC 
(Pharmacia, Sweden) equipped with an online UV280 and 
conductivity monitor. The mobile phase was 10 mM Tris, 
10 0 . IM NaCl, 0.02% v/v Tween, pH 8 . 0 . 10 ml fractions were 
collected between 12.5 min and 25 min to optimise 
recovery and removal of salt. Fractions were analysed by 
Biosensor analysis and pooled according to binding. 

15 All pooled active fractions were diluted with an equal 
volume of 20 mM Tris, 0.02% (v/v) Tween, pH 8,5 (Buffer 
B) and then loaded onto a Mono Q 5/5 (Pharmacia, Sweden) 
at a flow rate of 2 ml/min. The column was washed with 
buffer B. Elution was performed using a linear gradient 

20 between buffer B and buffer B containing 0 . 6M NaCl over 
30 min at a flow rate of 1 ml/min. Fractions (1 minute) 
were collected and analysed on the Biosensor and also by 
SDS PAGE and Western blot analysis. Fractions 15 to 26 
(approximately 0.4M NaCl) appear to contain the majority 

2 5 of mNR6 as indicated by the Biosensor. 

C. Production of CHO derived N' FLAG-mNRS (monomeric 
form) 

3 0 (i) Protein Production 

A cDNA fragment containing the entire coding sequence of 
murine NRG with an N- terminal FLAGJ sequence was cloned 
into the expression vector pCHOl for production of N- 
35 terminal FIAG- tagged protein. This vector contains a 

neomycin resistance gene with expression of the NRG gene 
under the control of an EFl" promoter. This expression 
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construct was transfected into CHO cells using 
Lipof ectamine (Gibco BRL, USA) according to the 
manufacturer instructions. Transfected cells were 
cultured in IMDM + 10% (v/v) FCS with resistant cells 
5 selected in geneticin (600Fg/ml, Gibco BRL, USA) . A 

neomycin resistant clone, secreting comparatively high 
levels of NR6 was selected and expanded for further 
analysis . 

10 (ii) Protein expression 

FIiAG-NR6 expressed in serum free conditioned media 
(10 litre) was harvested from transfected CHO and cells. 
Collected media was concentrated using a CH2 

15 ultrafiltration system equipped with a SIYIO cartridge 
(Amicion molecular weight cut-off 10,000). Preliminary 
examination of the expressed product under reducing and 
non-reducing SDS PAGE followed by western blot analysis 
was performed. Visualisation of the protein on Westerns 

20 was specific to the primary antibody anti FLAG M2 . Under 
reducing conditions a band approximately at 65,000 
daltons was observed. Under non- reducing conditions, 
dimer and larger molecular weight aggregates were 
observed. These are disulphide linked monomers as they 

25 are not present in the reducing gel . Small amounts of 
monomer appear to be present in non-reducing gels, 

(iii) Affinity Chromatography of NR6 

Concentrated conditioned media was applied to an anti 
30 FliAG M2 affinity resin (100 x 16 mm I.D.) . After washing 
the unbound proteins off the column, the bound proteins 
were eluted using FLAG peptide (60Fg/ml) in PBS. 

(iv) Ion Exchange Chromatography of NR6 

35 

Eluted fractions from affinity column were dialysed 
overnight against 20 mM Tris-HCl pH 8 . 5 (buffer C) 
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containing 50 mM Dithiothretol (DTT) using 25,000 cut- 
off dialysis tubing (Spectra/Por7 , Spectrum) . The 
dialysed fractions were loaded onto Mono Q 5/5 
(Pharmacia, Sweden) previously equilibrated with buffer 
5 C containing 5 mM DTT. Chromatography was developed 

using a linear gradient between buffer C and buffer C 
containing 1.0 M NaCl at a flow rate of 0 . 5 ml / min. 

(v) Refolding of NR6 

10 

Fractions containing NRG from the Mono Q were adjusted 
to 50 mM DTT and left overnight at 41C. To initiated 
refolding the sample was then dialysed against 50 mM 
Tris-HCl (pH 8,5), 2 M Urea, 0.1% (v/v) Tween 20, 10 mM 
15 Glutathione (reduced) and 2 mM Glutathione (oxidised) at 
a final protein concentration of 100 Fg / ml . Folding 
was carried out at ambient temperature with one change 
of the buffer over 24 hours. 

2 0 (v) Reversed Phase High Performance Liquid 

Chroma t ography ( RP - HPLC ) 

The folded product was further purified by RP-HPLC using 
a Vydac C4 resin (2 50 x 4.6 mm I.D.) previously 
25 equilibrated with 0.1% (v/v) Trif luoroacetic acid (TEA). 
Elution was carried out using a linear gradient from 0 
to 80% (v/v) acetonitrile / 0.1% (v/v) TFA at a flow 
rate of 1 ml per minute. 

30 D. pCHOl/NRS/FLAG 

In order to determine the native N termini of NRG, a C 
terminal FLAG NR6 CHO cell line was established. 

The plasmid pKUSA166 (murine NRG cDNA cloned into the 
35 EcoR I site of pBLUESCRIPT) was digested with BamH I to 
remove the sequences encoding the last 15 amino acids of 
murine NRG. Synthetic oligonucleotides which encode the 
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3' end of mouse NR6 followed by the FLAG peptide tag 
were annealed and ligated into the BamH I site of 
pKUSA166. The sequence of the oligonucleotides was as 
follows : - 

5 

I LPSGRRGAARGPAGDYKD 
D D D K * [SEQ ID NO: 34] 

GATCTTGCCCTCGGGCAGACGGGGTGCGGCGAGAGGTCCTGCCGGCGACTACAAGG 
10 ACGACGATGACAAGTA G [SEQ ID NO: 33] 

AACGGGAGCCCGTCTGCCCCACGCCGCTCTCCAGGACGGCCGCTGATGTTCCTGCT 
GCTACTGTTCATCCTAG [SEQ ID NO: 35] 

The 5* end of the linker introduces a silent mutation 
15 (CTG > TTG) , to destroy the 5' BamH I site upon 

insertion of the linlcer. The NR6 cDNA (with native 
signal sequence) with the C-terminal FLAG was cut out of 
pKUSA166 with EcoR I and BamH I and cloned into the EcoR 
I - BamH I cloning sites of pCHO-1. This vector results 
20 in the secretion of NRG protein with a C-terminal flag 
tag (CN FLAG-mRN6) . 

This vector results in the secretion of NRG protein from 
KUSA cells. The vector pCHOl has been previously 
25 described in (17) although with a different secretable 
marker . 

(i) Production of polyclonal NRG antiserum 

30 The following peptide from the N terminal area of NRG 

was chosen for production of polyclonal antiserum to NRG 

VISPQDPTLLIGSSLQATCSIHGDTP [SEQ ID NO: 39] 

3 5 The peptide was conjugated to KLH and injected into 

rabbits. Production and purification of the polyclonal 
antibody specific to the NRG peptide sequence follows 
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(ii) Protein expression 

5 KUSA cells transfected with cDNA of C terminal tagged 
mNR6 were grown to confluence in flasks (800ml) using 
IMDM media containing 10% (v/v) FBS . Conditioned media 
(100 ml) was harvested 3 -4 days post confluence. 

10 (iii) Characterisation of NR6 by Immunoprecipitat ion 

and Western blotting 

In order to establish that NRG with the predicted 
sequence is produced in KUSA cells transfected with the 

15 cDNA, western blot analysis using both M2 antibody and 
purified NR6 specific rabbit antibody were performed. 
Conditioned media (1 to 5 ml) was immunoprecipitated 
with M2 affinity resin (10-20 Fl ) . Then after sufficient 
time for binding, the beads were washed with MT-PBS and 

20 subsequently NR6 eluted with 100 Fg/ml FLAG peptide (40 

Fl, (1, 5 minute incubation) . The sample was then 
subjected to reducing and non reducing SDS PAGE followed 
by western blot analysis. Both purified NR6 polyclonal 
antibody (purified by protein G) and M2 antibody 

25 recognise a band under reducing conditions of a 

molecular weight size approximately 65,000 daltons. 
Since the two antibodies reconising resides at the N 
terminus and C terminus it is reasonable to assume that 
full length NR6 is produced. Biotinylation of the 

3 0 respective antibodies by standard methods reduces the 

background. Under non-reducing conditions polyclonal NR6 
bind antibodies to a band of a molecular weight of 
approximately 127,000, consistent with a dimeric NR6 
disulphide linked form. Minor components of tetrameric 

35 NRG are present, no monomeric NR6 is evident . using 
polyclonal NR6 antibodies. 
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EXAMPLE 15 
Generation of NR6 knockout mice 

To construct the NR6 targeting vector, 4 . Ikb of genomic 
5 NR6 DNA containing exons 2 through to 6 was deleted and 
replaced with G418 -resistance cassette, leaving 5N and 
3N NR6 arms of 2.9 and 4.5 kb respectively. A 4 . 5 kb 
Xhol fragment of the murine genomic NRG clone 2.2 
(Figure 3) containing exons 7, 8 and 3N flanking 

10 sequence was subcloned into the Xhol site of pBluescript 
generating pBSNR6Xho4 . 5 . A 2 . 9kb Notl-Stul fragment 
within NR6 intron 1 from the same genomic clone was 
inserted into Not I and EcoRV digested pBSNR6Xho4 . 5 
creating pNR6-Ex2-6. This plasmid was digested with 

15 Clal , which was situated between the two NR6 fragments, 
and following blunt ending, ligated with a blunted 6kb 
Hindlll fragment from placZneo, which contains the 
lacZgene and a PGKneo cassette, to generate the final 
targeting vector, pNR61acZneo. pNRSlacZneo was 

20 linearised with NotI and electroporated into W9.5 

embryonic stem cells. After 48 hours, transfected cells 
were selected in 175 Fg/ml G418 and resistant clones 
picked and expanded after a further 8 days. 

25 Clones in which the targetting vector had recombined 
with the endogenous NRG gene were identified by 
hybridising Spel -digested genomic DNA with a 0.6 kb 
XhoI-StuI fragment from genomic NR6 clone 2.2. This 
probe (probe A, Figure 4) , which is located 3N to the 

30 NR6 sequences in the targeting vector, distinguished 
between the endogenous (9.9 kb) and targeted (7.1 kb) 
NRG loci (Figure 5) . 

Genomic DNA was digested with Spel for IGhrs at 371C, 
35 electrophoresed through 0.8% (w/v) agarose, transferred 
to nylon membranes and hybridised to ^^P- label led probe 
in a solution containing 0 . 5iyi sodium phosphate, 7% (w/v) 

- 65 - 



SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <WO 9811225A3JA> 



wo 98/11225 



PCT/GB97/02479 



SDS, ImM EDTA and washed in a solution containing 4 0mM 
sodium posphate, 1% (w/v) SDS at 651C. Hybridising 
bands were visualised by autoradiography for 16 hours at 
-701C using Kodak XAR-5 film and intensifying screens. 
5 Two targeted ES cell clones, W9.5NR6-2-44 and W9,5NR6-4- 
2, were injected into C57B1/6 blastocysts to generate 
chimeric mice. Male chimeras were mated with C57B1/6 
females to yield NR6 heterozygotes which were 
subsequently interbred to produce wild-type (NRe*^*) , 
10 heterozygous (NRS""^") and mutant (NR6"^') mice. The 

genotypes of offspring were determined by Southern Blot 
analysis of genomic DNA extracted from tail biopsies. 

Genotyping of mice at weaning from matings between NR"^^' 
15 heterozygous mice derived from both targated ES cell 

clones revealed an absence of homozygous NRG'^~ mutants. 

As no unusual loss of mice was observed between birth 

and weaning, this suggest that lack of NR6 is lethal 

during embryonic development or immediately after birth . 
20 Genotyping of embryonic tissues at various stages of 

development suggests that death occurs late in gestation 

(beyond day 16) or at birth. 

EXAMPLE 16 

2 5 Oligonucleotides 

1943 : 

5 • GTC CAA GTG CGT TGT AAC CCA 3 ' 
2070 : 

5 ' OCT GAG TGT GCG CTG GGT CTC ACC 3 ' 
30 2057: 

5 ' GGC TCC ACT CGC TCC AGA 3 ' 

Those skilled in the art will appreciate that the 
invention described herein is susceptible to variations 

3 5 and modifications other than those specifically 

described. It is to be understood that the invention 
includes all such variations and modifications. The 
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invention also includes all of the steps, features, 
compositions and compounds referred to or indicated in 
this specification, individually or collectively, and 
any and all combinations of any two or more of said 
5 steps or features. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: (Other than US) AMRAD OPERATIONS PTY 

LTD 

(US only) Douglas James HILTON, Nicos Antony 
NICOLA, Alison FARLEY, Tracey WILLSON, Jian-Guo ZHANG, 
10 Warren ALEXANDER, Steven RAKAR, Louis FABRI , Tetsuo 

KOJIMA, Masatsugu MAEDA, Yasumfumi KIKUCHI, Andrew NASH 



(ii) TITLE OF INVENTION: A NOVEL HAEMPOIETIN 

RECEPTOR AND GENETIC 
SEQUENCES ENCODING SAME 



(iii) NUMBER OF SEQUENCES: 3 9 



( i V ) CORRES PONDENCE ADDRES S : 

(A) ADDRESSEE: DAVIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

( E ) COUNTRY : AUSTRALIA 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

3 0 (C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version 

#1.25 



(vi) CURRENT APPLICATION DATA: 
3 5 (A) APPLICATION NUMBER: 

PCT INTERNATIONAL APPLICATION 
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10 



(B) FILING DATE: ll-SEP-1997 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P0224 6/96 

(B) FILING DATE: ll-SEP-1996 

(viii) ATTORNEY/AGENT INFORD4ATION : 

(A) NAME: HUGHES DR, E JOHN L 

(C) REFERENCE /DOCKET NUMBER: EJH/AF 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +61 3 9254 2777 

(B) TELEFAX: +61 3 9254 2770 



15 (2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 
2 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 

25 

Trp Ser Xaa Trp Ser 



30 (2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

5 

ACTCGCTCCA GATTCCCGCC TTTT 24 



10 (2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 i near 

(ii) MOLECULE TYPE: DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



2 5 . TCCCGCCTTT TTCGACCCAT AGAT 24 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GGTACTTGGC TTGGAAGAGG AAAT 24 
(2) INFORMATION FOR SEQ ID NO : 5 : 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

CGGCTCACGT GCACGTCGGG TGGG 24 
(2) INFORMATION FOR SEQ ID NO : 6 : 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
AGCTGCTGTT AAAGGGCTTC TC 22 



35 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



15 



35 



(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

(A/G) CTCCA (A/G) TC (A/G) CTCCA 15 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 
2 0 (A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

2 5 (ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

3 0 (A/G) CTCCA (C/T)TC (A/G) CTCCA 15 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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25 



30 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

10 

AAGTGTGACC ATCATGTGGA C 21 



15 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 



GGAGGTGTTA AGGAGGCG 



(2) INFORMATION FOR SEQ ID NO: 11: 

3 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
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10 



25 



30 



'(C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 



ATGCCCGCGG GTCGCCCG 18 



15 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 150 6 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1242 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 



GGCACGAGCT TCGCTGTCCG CGCCCAGTGA CGCGCGTGCG GACCCGAGCC CCAATCTGCA -64 

3 5 CCCCGCAGAC TCGCCCCCGC CCCATACCGG CGTTGCAGTC ACCGCCCGTT GCGCGCCACC -4 

ccc -3 

ATG ccc GCG GGT CGC CCG GGC CCC GTC GCC CAA TCC GCG CGG CGG CCG 4 8 
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Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
15 10 15 

CCG CGG CCG CTG TCC TCG CTG TGG TCG CCT CTG TTG CTC TGT GTC CTC 96 
5 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro. Leu Leu Leu Cys Val Leu 

20 25 30 

GGG GTG CCT CGG GGC GGA TCG GGA GCC CAC ACA GCT GTA ATC AGC CCC 144 
Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 
10 35 40 45 

CAG GAC CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT 192 

Gin Asp Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser 
50 55 60 

15 

ATA CAT GGA GAC ACA CCT GGG GCC ACC GCT GAG GGG CTC TAC TGG ACC 24 0 

lie His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
65 70 75 80 

2 0 CTC AAT GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC 2 88 

Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 

TCC ACC CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG 336 
2 5 Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 

100 105 110 

TCA GGA GAC AAT CTG GTG TGT CAC GCC CGA GAC GGC AGC ATT CTG GCT 3 84 
Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala 
30 115 120 125 

GGC TCC TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC 4 32 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie 
130 135 140 

35 



- 78 - 



BNSCX5CID: <WO 9811225A3JA> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



AGC TGC TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG 4 80 
Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 



5 GGT GCA CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC 52 8 

Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 
165 170 175 

AAG CTG AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT 57 6 

10 Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 

180 185 190 



GTG GGC CCT 
val Gly Pro 
15 195 



CAC TCA TGC CAT ATC 
His Ser Cys His lie 
200 



CCC AAG GAC CTG GCC 
Pro Lys Asp Leu Ala 
205 



CTC TTC ACT 624 
Leu Phe Thr 



CCC TAT GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA GCA AGA 672 

Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 

210 215 220 

20 

TCT GAT GTC CTC ACA CTG GAT GTC CTG GAC GTG GTG ACC ACG GAC CCC 72 0 

Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 

225 230 235 240 

25 CCA CCC GAC GTG CAC GTG AGC CGC GTT GGG GGC CTG GAG GAC CAG CTG 768 

Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 
245 250 255 



ACT GTG CGC TGG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA 816 
3 0 Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 

260 265 270 

GCC AAG TAC CAG ATC CGC TAC CGC GTG GAG GAC AGC GTG GAC TGG AAG 864 
Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
35 275 280 285 
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GTG GTG GAT GAC GTC AGC AAC CAG ACC TCC TGC CGT CTC GCG GGC CTG 912 
Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 
290 295 300 

5 AAG CCC GGC ACC GTT TAG TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG 960 

Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly 
305 310 315 320 

ATC TAT GGG TCG AAA AAG GCG GGA ATC TGG AGC GAG TGG AGC CAC CCC 1008 
10 He Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Giu Trp Ser His Pro 

325 330 335 

ACC GCT GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG 1056 
Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 
15 340 345 350 

GTG TGC GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC 1104 

Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 
355 360 365 

20 

GAG CTC AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAC TGC TCG 1152 
Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 
370 375 380 

2 5 AAC CTT AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG 120 0 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
385 390 395 400 

TCA CAC AAG ACC CGA AAC CAG GTC CTG CCG GCT AAA CTC TAAGGATAGG 124 9 

3 0 Ser His Lys Thr Arg Asn Gin Val Leu Pro Ala Lys Leu 

405 410 

CCATCCTCCT GCTGGGTCAG ACCTGGAGGC TCACCTGAAT TGGAGCCCCT CTGTACCATC 13 0 9 

35 TGGGCAACAA AGAAACCTAC CAGAGGCTGG GGCACAATGA GCTCCCACAA CCACAGCTTT 13 6 9 

GGTCCACATG ATGGTCACAC TTGGATATAC CCCAGTGTGG GTAAGGTTGG GGTATTGCAG 142 9 
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BNSOOCID: <WO 981 1225A3JA> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 

GGCCTCCCAA CAATCTCTTT AAATAAATAA AGGAGTTGTT CAGGTAAAAA AAAAAAAAAA 14 8 9 

AAAAAAAAAA AAAAAAA 1506 



5 



10 



(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 413 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: protein 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
1 5 10 15 

20 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 

20 25 30 

Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 
35 40 45 

25 

Gin Asp Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 
50 55 60 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
30 65 70 75 80 

Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 

3 5 Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 

100 105 110 
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SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala 
115 120 125 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn He 
5 130 135 140 

Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 

10 Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 

165 170 175 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

15 

Val Gly Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 

Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
20 210 215 220 

Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 
225 230 235 240 

2 5 Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 

245 250 255 

Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

30 

Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
275 280 285 

Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 
35 290 295 300 
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PCT/GB97/02479 



Lys Pro Gly 
305 

lie Tyr Gly 

5 

Thr Ala Ala 



10 Val Cys Glu 

355 

Glu Leu Lys 
370 

15 

Asn Leu Ser 
385 

Ser His Lys 

20 



Thr Val Tyr Phe 
310 

Ser Lys Lys Ala 
325 

Ser Thr Pro Arg 
340 

Pro Arg Gly Gly 

Gin Phe Leu Gly 
375 

Phe Arg Leu Tyr 
390 

Thr Arg Asn Gin 
405 



Val Gin Val Arg Cys 
315 

Gly lie Trp Ser Glu 
330 

Ser Glu Arg Pro Gly 
345 

Glu Pro Ser Ser Gly 
360 

Trp Leu Lys Lys His 
380 

Asp Gin Trp Arg Ala 
395 

Val Leu Pro Ala Lys 
410 



Asn Pro Phe Gly 
320 

Trp Ser His Pro 
335 

Pro Gly Gly Gly 
350 

Pro Val Arg Arg 
365 

Ala Tyr Cys Ser 

Trp Met Gin Lys 
400 

Leu 



25 



(2) INFORMATION FOR SEQ ID NO: 14 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
3 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



3 5 (ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 1 . . 1278 
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BNSDOCID: <WO 981 1225A3JA> 



SUBSTITUTE SHEET (RULE 26) 



wo 98/1 1225 PCT/GB97/02479 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 



GGCACGAGCT TCGCTGTCCG CGCCCAGTGA 

5 

CCCCGCAGAC TCGCCCCCGC CCCATACCGG 
CCCA 

10 

ATG CCC GCG GGT CGC CCG GGC CCC 
Met Pro Ala Gly Arg Pro Gly Pro 
1 5 

15 CCG CGG CCG CTG TCC TCG CTG TGG 

Pro Arg Pro Leu Ser Ser Leu Trp 
20 



CGCGCGTGCG GACCCGAGCC CCAATCTGCA -65 
CGTTGCAGTC ACCGCCCGTT GCGCGCCACC -5 

-1 

GTC GCC CAA TCC GCG CGG CGG CCG 4 8 

Val Ala Gin Ser Ala Arg Arg Pro 
10 15 

TCG CCT CTG TTG CTC TGT GTC CTC 96 
Ser Pro Leu Leu Leu Cys Val Leu 
25 30 



GGG GTG CCT CGG GGC GGA TCG GGA GCC CAC ACA GCT GTA ATC AGC CCC 144 
2 0 Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val He Ser Pro 

35 40 45 

CAG GAC CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT 192 
Gin Asp Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 
25 50 55 60 

ATA CAT GGA GAC ACA CCT GGG GCC ACC GCT GAG GGG CTC TAC TGG ACC 24 0 

He His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
65 70 75 80 



30 



288 



CTC AAT GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC 
Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 
85 90 95 

35 TCC ACC CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG 3 36 

Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 
100 105 110 

- 84 - 

SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO ^981 1225A3JA> 



wo 98/11225 



PCT/GB97/02479 



TCA GGA GAC AAT CTG GTG TGT CAC GCC CGA GAC GGC AGC ATT CTG GCT 3 84 

Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala 
115 120 125 

5 GGC TCC TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC 4 32 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie 
130 135 140 

AGC TGC TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG 4 80 

10 Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 

145 150 155 160 

GGT GCA CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC 528 
Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 
15 165 170 175 

AAG CTG AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT 576 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

20 

GTG GGC CCT CAC TCA TGC CAT ATC CCC AAG GAC CTG GCC CTC TTC ACT 624 

Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 

2 5 CCC TAT GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA GCA AGA 672 

Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
210 215 220 

TCT GAT GTC CTC ACA CTG GAT GTC CTG GAC GTG GTG ACC ACG GAC CCC 720 

3 0 Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 

225 230 235 240 

CCA CCC GAC GTG CAC GTG AGC CGC GTT GGG GGC CTG GAG GAC CAG CTG 76 8 

Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 
35 245 250 255 
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SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 

AGT GTG CGC TGG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA 816 
Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

5 GCC AAG TAG CAG ATC CGC TAG CGC GTG GAG GAC AGC GTG GAC TGG AAG 864 

Ala Lys Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 
275 280 285 

GTG GTG GAT GAC GTC AGC AAC CAG ACC TCC TGC CGT CTC GCG GGC CTG 912 
10 Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 

290 295 300 

AAG CCC GGC ACC GTT TAC TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG 960 
Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly 
15 305 310 315 320 

ATC TAT GGG TCG AAA AAG GCG GGA ATC TGG AGC GAG TGG AGC CAC CCC 10 08 

lie Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His Pro 
325 330 335 

20 

ACC GCT GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG 1056 
Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 
340 345 350 

2 5 GTG TGC GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC 1104 

Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 
355 360 365 

GAG CTC AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAC TGC TCG 1152 

3 0 Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 

370 375 380 

AAC CTT AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG 12 00 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
35 385 390 395 400 
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wo 98/11225 PCT/GB97/02479 

TCA CAC AAG ACC CGA AAC CAG GAG GAG GGG ATC CTG OCT TCG GGC AGA 124 8 
Ser His Lys Thr Arg Asn Gin Asp Glu Gly lie Leu Pro Ser Gly Arg 
405 410 415 

i CGG GGT GCG GCG AGA GGT OCT GCC GGT TAAACTCTAA GGATAGGCCA 12 95 

Arg Gly Ala Ala Arg Gly Pro Ala Gly 
420 425 

TCCTCCTGCT GGGTCAGACC TGGAGGCTCA CCTGAATTGG AGCCCCTCTG TACCATCTGG 13 55 

GCAACAAAGA AACCTACCAG AGGCTGGGGC ACAATGAGCT CCCACAACCA CAGCTTTGGT 1415 

CCACATGATG GTCACACTTG GATATACCCC AGTGTGGGTA AGGTTGGGGT ATTGCAGGGC 14 75 

15 CTCCCAACAA TCTCTTTAAA TAAATAAAGG AGTTGTTCAG GTAAAAAAAA AAAAAAAAAA 15 3 5 

AAAAAAAAAA AAAA 1549 



10 



20 



(2) INFORMATION FOR SEQ ID NO: 15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 5 amino acids 

2 5 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

3 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Pro Ala Gly Arg Pro Gly Pro Val Ala Gin Ser Ala Arg Arg Pro 
15 10 15 

3 5 Pro Arg Pro Leu Ser Ser Leu Trp Ser Pro Leu Leu Leu Cys Val Leu 

20 25 30 

- 87 - 
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wo 98/11225 



PCT/GB97/02479 



Gly Val Pro Arg Gly Gly Ser Gly Ala His Thr Ala Val lie Ser Pro 
35 40 45 

Gin Asp Pro Thr Leu Leu He Gly Ser Ser Leu Gin Ala Thr Cys Ser 
5 50 55 60 

lie His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr 
65 70 75 80 

10 Leu Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr 

85 90 95 

Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin 
100 105 110 

15 

Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala 
115 120 125 

Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie 
20 130 135 140 

Ser Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro 
145 150 155 160 

2 5 Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr 

165 170 175 

Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr 
180 185 190 

30 

Val Gly Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr 
195 200 205 

Pro Tyr Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg 
35 210 215 220 
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Ser Asp Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro 
225 230 235 240 

Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu 
5 245 250 255 

Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin 
260 265 270 

10 Ala Lys Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys 

275 280 285 

Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu 
290 295 300 

15 

Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly 
305 310 315 320 

lie Tyr Gly Ser Lys Lys Ala Gly lie Trp Ser Glu Trp Ser His Pro 
20 325 330 335 

Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly 
340 345 350 

2 5 Val Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg 

355 360 365 

Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser 
370 375 380 

30 

Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys 
385 390 395 400 

Ser His Lys Thr Arg Asn Gin Asp Glu Gly lie Leu Pro Ser Gly Arg 
35 405 410 415 
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SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 PCT/GB97/02479 
Arg Gly Ala Ala Arg Gly Pro Ala Gly 

420 425 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 8 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



15 



20 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 1..468 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 



2 5 GGC ACC GTT TAG TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG ATC TAT 4 8 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 

15 10 15 

GGG TCG AAA AAG GCG GGA ATC TGG AGC GAG TGG AGC CAC CCC ACC GCT 96 

3 0 Gly Ser Lys Lys Ala Gly lie Trp Ser Glu Trp Ser His Pro Thr Ala 

20 25 30 

GCC TCC ACC CCT CGA AGT GAG CGC CCG GGC CCG GGC GGC GGG GTG TGC 144 

Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly Val Cys 

35 35 40 45 
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BNSDOCID: <WO ^981 1 225A3JA> 



wo 98/11225 



PCT/GB97/02479 



GAG CCG CGG GGC GGC GAG CCC AGC TCG GGC CCG GTG CGG CGC GAG CTC 192 
Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg Glu Leu 
50 55 60 

5 AAG CAG TTC CTC GGC TGG CTC AAG AAG CAC GCA TAG TGC TCG AAC CTT 24 0 

Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser Asn Leu 
65 70 75 80 

AGT TTC CGC CTG TAC GAC CAG TGG CGT GCT TGG ATG CAG AAG TCA CAC 288 
10 Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys Ser His 

85 90 95 

AAG ACC CGA AAC CAG GTA GGA AAG TTG GGG GAG GCT TGC GTG GGG GGT 33 6 

Lys Thr Arg Asn Gin Val Gly Lys Leu Gly Glu Ala Cys Val Gly Gly 
15 100 105 110 

AAA GGA GCA GAG GAA GAG AGA GAC CCG GGT GAG CAG CCT CCA CAA CAC 3 84 

Lys Gly Ala Glu Glu Glu Arg Asp Pro Gly Glu Gin Pro Pro Gin His 
115 120 125 

20 

CGC ACT CTT CTT TCC AAG CAC AGG ACG AGG GGA TCC TGC CCT CGG GCA 432 
Arg Thr Leu Leu Ser Lys His Arg Thr Arg Gly Ser Cys Pro Arg Ala 
130 135 140 

2 5 GAC GGG GTG CGG CGA GAG GTA AGG GGG TCT GGG TGAGTGGGGC CTACAGCAGT 4 85 

Asp Gly Val Arg Arg Glu Val Arg Gly Ser Gly 
145 150 155 

CTAGATGAGG CCCTTTCCCC TCCTTCGGTG TTGCTCAAAG GGATCTCTTA GTGCTCATTT 545 

30 

CACCCACTGC AAAGAGCCCC AGGTTTTACT GCATCATCAA GTTGCTGAAG GGTCCAGGCT 605 

TAATGTGGCC TCTTTTCTGC CCTCAGGTCC TGCCGGCTAA ACTCTAAGGA TAGGCCATCC 665 

3 5 TCCTGCTGGG TCAGACCTGG AGGCTCACCT GAATTGGAGC CCCTCTGTAC CTATCTGGGC 72 5 

AACAAAGAAA CCTACCATGA GGCTGGGGCA CAATGAGCTC CCACAACCAC AGCTTTGGTC 78 5 
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SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO gei 1225A3JA> 



wo 98/11225 PCT/GB97/02479 

CACATGATGG TCACACTTGG ATATACCCCA GTGTGGGTAA GGTTGGGGTA TTGCAGGGCC 84 5 

TCCCAACAAT CTCTTTAAAT AAATAAAGGA GTTGTTCAGG TAAAAAAAAA AAAAAAAAAA 90 5 

5 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAA 938 

(2) INFORMATION FOR SEQ ID NO: 17: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 
20 1 5 10 15 

Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His Pro Thr Ala 
20 25 30 

2 5 Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly Gly Val Cys 

35 40 45 

Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg Arg Glu Leu 
50 55 60 

30 

Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys Ser Asn Leu 
65 70 75 80 

Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin Lys Ser His 
35 85 90 95 
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Lys Thr Arg Asn Gin Val Gly Lys 
100 

Lys Gly Ala Glu Glu Glu Arg Asp 
5 115 120 

Arg Thr Leu Leu Ser Lys His Arg 
130 135 

10 Asp Gly Val Arg Arg Glu Val Arg 

145 150 



Leu Gly Glu Ala Cys Val Gly Gly 
105 110 

Pro Gly Glu Gin Pro Pro Gin His 
125 

Thr Arg Gly Ser Cys Pro Arg Ala 
140 

Gly Ser Gly 
155 



15 (2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 834 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



25 



30 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1 . . 834 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 



CCC ACC CTT CTC ATC GGC TCC TCC CTG CAA GCT ACC TGC TCT ATA CAT 98 
Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser He His 
35 51 55 60 65 
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GGA GAC ACA CCT GOG GCC ACC GCT GAG GGG CTC TAG TGG ACC CTC AAT 14 6 

Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Leu Asn 
70 75 80 

5 GGT CGC CGC CTG CCC TCT GAG CTG TCC CGC CTC CTT AAC ACC TCC ACC 194 

Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser Thr 
85 90 ^5 

CTG GCC CTG GCC CTG GCT AAC CTT AAT GGG TCC AGG CAG CAG TCA GGA 24 2 

10 Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser Gly 

100 105 110 

GAC AAT CTG GTG TGT CAC GCC CGA GAC GGC AGC ATT CTG GCT GGC TCC 2 90 

Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser He Leu Ala Gly Ser 
15 115 120 125 130 

TGC CTC TAT GTT GGC TTG CCC CCT GAG AAG CCC TTT AAC ATC AGC TGC 3 38 

Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie Ser Cys 
135 140 145 

20 

TGG TCC CGG AAC ATG AAG GAT CTC ACG TGC CGC TGG ACA CCG GGT GCA 3 86 

Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly Ala 
150 155 200 

2 5 CAC GGG GAG ACA TTC TTA CAT ACC AAC TAC TCC CTC AAG TAC AAG CTG 4 34 

His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys Leu 
205 210 215 

AGG TGG TAC GGT CAG GAT AAC ACA TGT GAG GAG TAC CAC ACT GTG GGG 4 82 

3 0 Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr Val Gly 

220 225 230 

CCC CAC TCA TGC CAT ATC CCC AAG GAC CTG GCC CTC TTC ACT CCC TAT 53 0 

Pro His Ser Cys His He Pro Lys Asp Leu Ala Leu Phe Thr Pro Tyr 
35 235 240 245 250 
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wo 98/11225 



PCT/GB97/02479 



GAG ATC TGG GTG GAA GCC ACC AAT CGC CTA GGC TCA GCA AGA TCT GAT 57 8 

Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg Ser Asp 
255 260 265 



5 GTC CTC ACA CTG GAT GTC CTG GAG GTG GTG ACC ACG GAC CCC CCA CCC 62 6 

Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro Pro Pro 

270 275 280 

GAC GTG CAC GTG AGC CGC GTT GGG GGC CTG GAG GAC CAG CTG AGT GTG 67 4 

10 Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu Ser Val 

285 290 295 



CGC TGG GTC TCA CCA CCA GCT CTC AAG GAT TTC CTC TTC CAA GCC AAG 722 
Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin Ala Lys 
15 300 305 310 



TAC CAG ATC CGC TAC CGC GTG GAG GAC AGC GTG GAC TGG AAG GTG GTG 77 0 

Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys Val Val 
315 320 325 330 

20 

GAT GAC GTC AGC AAC CAG ACC TCC TGC CGT CTC GCG GGC CTG AAG CCC 818 
Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu Lys Pro 
335 340 345 

2 5 GGC ACC GTT TAC TTC GTC CAA GTG CGT TGT AAC CCA TTC GGG ATC TAT' 866 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly lie Tyr 
350 355 360 



GGG TCG AAA AAG GCG GGA 8 94 

3 0 Gly Ser Lys Lys Ala Gly 

365 



(2) INFORMATION FOR SEQ ID NO: 19: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 78 amino acids 
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SUBSTITUTE SHEET (RULE 26) 



wo 98/11225 



PCT/GB97/02479 



(B)' TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19; 



Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser lie His 
10 51 55 60 65 

Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Leu Asn 
70 75 80 

15 Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser Thr 

85 90 95 

Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser Gly 
100 105 110 

20 

Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala Gly Ser 
115 120 125 130 

Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie Ser Cys 
25 135 140 145 

Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly Ala 
150 155 200 

3 0 His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys Leu 

205 210 215 

Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His Thr Val Gly 
220 225 230 



35 



Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe Thr Pro Tyr 
235 240 245 250 
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Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala Arg Ser Asp 
255 260 265 



Val Leu Thr Leu Asp Val Leu Asp Val Val Thr Thr Asp Pro Pro Pro 
5 270 275 280 

Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin Leu Ser Val 
285 290 295 

10 Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe Gin Ala Lys 

300 305 310 

Tyr Gin lie Arg Tyr Arg Val Glu Asp Ser Val Asp Trp Lys Val Val 
315 320 325 330 

15 

Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly Leu Lys Pro 
335 340 345 

Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe Gly He Tyr 
20 350 355 360 

Gly Ser Lys Lys Ala Gly 
365 



25 



(2) INFORMATION FOR SEQ ID NO: 20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 143 base pairs 
30 (B) TYPE: nucleic acids 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

3 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
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GGCATGAAGG CTTAGGGTGG GGATCGGTAG GACCCATGCA CCCAGAGAAA GGGACTGGTG 6 0 

GCAACTTTCA AACTCTCTGG GGAAGGAAGA AGGGCTGAAA GAGG 104 

5 ATG AAC GGG CTC AGA CAC AGC TGT AAT CAG CCC CCA GGA 14 3 

Met Asn Gly Leu Arg His Ser Cys Asn Gin Pro Pro Gly 

5 10 

10 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acids 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

■ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

20 

Met Asn Gly Leu Arg His Ser Cys Asn Gin Pro Pro Gly 

5 10 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 193 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

GGCACGAGCT TCGCTGTCCG CGCCCAGTGA CGCGCGTGCG GACCCGAGCC CCAATCTGCA 60 

5 CCCCGCAGAC TCGCCCCCGC CCCATACCGG CGTTGCAGTC ACCGCCCGTT GCGCGCCACC 12 0 

CCCAATGCCC GCGGGTCGCC CGGGCCCCGT CGCCCAATCC GCGCGGCGGC CGCCGCGGCC 18 0 

GCTGTCCTCG CTGTGGTCGC CTCTGTTGCT CTGTGTCCTC GGGGTGCCTC GGGGCGGATC 24 0 

10 

GGGAGCCCAC ACAGCTGTAA TCAGCCCCCA GGACCCCACC CTTCTCATCG GCtCCTCCCT 3 00 

GCAAGCTACC TGCTCTATAC ATGGAGACAC ACCTGGGGCC ACCGCTGAGG GGCTCTACTG 3 60 

15 GACCCTCAAT GGTCGCCGCC TGCCCTCTGA GCTGTCCCGC CTCCTTAACA CCTCCACCCT 42 0 

GGCCCTGGCC CTGGCTAACC TTAATGGGTC CAGGCAGCAG TCAGGAGACA ATCTGGTGTG 4 80 

TCACGCCCGA GACGGCAGCA TTCTGGCTGG CTCCTGCCTC TATGTTGGCT TGCCCCCTGA 540 

20 

GAAGCCCTTT AACATCAGCT GCTGGTCCCG GAACATGAAG GATCTCACGT GCCGCTGGAC 600 

ACCGGGTGCA CACGGGGAGA CATTCTTACA TACCAACTAC TCCCTCAAGT ACAAGCTGAG 660 

2 5 GTGGTACGGT CAGGATAACA CATGTGAGGA GTACCACACT GTGGGCCCTC ACTCATGCCA 720 

TATCCCCAAG GACCTGGCCC TCTTCACTCC CTATGAGATC TGGGTGGAAG CCACCAATCG 780 

CCTAGGCTCA GCAAGATCTG ATGTCCTCAC ACTGGATGTC CTGGACGTGG TGACCACGGA 84 0 

30 

CCCCCCACCC GACGTGCACG TGAGCCGCGT TGGGGGCCTG GAGGACCAGC TGAGTGTGCG 900 

CTGGGTCTCA CCACCAGCTC TCAAGGATTT CCTCTTCCAA GCCAAGTACC AGATCCGCTA 960 

35 CCGCGTGGAG GACAGCGTGG ACTGGAAGGT GGTGGATGAC GTCAGCAACC AGACCTCCTG 1020 

CCGTCTCGCG GGCCTGAAGC CCGGCACCGT TTACTTCGTC CAAGTGCGTT GTAACCCATT 10 8 0 
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CGGGATCTAT GGGTCGAAAA AGGCGGGAAT CTGGAGCGAG TGGAGCCACC CCACCGCTGC 114 0 

CTCCACCCCT CGAAGTGAGC GCCCGGGCCC GGGCGGCGGG GTGTGCGAGC CGCGGGGCGG 12 0 0 

5 CGAGCCCAGC TCGGGCCCGG TGCGGCGCGA GCTCAAGCAG TTCCTCGGCT GGCTCAAGAA 126 0 

GCACGCATAC TGCTCGAACC TTAGTTTCCG CCTGTACGAC CAGTGGCGTG CTTGGATGCA 1320 

GAAGTCACAC AAGACCCGAA ACCAGGTAGG AAAGTTGGGG GAGGCTTGCG TGGGGGGTAA 13 80 

AGGAGCAGAG GAAGAGAGAG ACCCGGGTGA GCAGCCTCCA CAACACCGCA CTCTTCTTTC 14 4 0 

CAAGCACAGG ACGAGGGGAT CCTGCCCTCG GGCAGACGGG GTGCGGCGAG AGGTAAGGGG 1500 

15 GTCTGGGTGA GTGGGGCCTA CAGCAGTCTA GATGAGGCCC TTTCCCCTCC TTCGGTGTTG 1560 

CTCAAAGGGA TCTCTTAGTG CTCATTTCAC CCACTGCAAA GAGCCCCAGG TTTTACTGCA 162 0 

TCATCAAGTT GCTGAAGGGT CCAGGCTTAA TGTGGCCTCT TTTCTGCCCT CAGGTCCTGC 1680 

20 

CGGCTAAACT CTAAGGATAG GCCATCCTCC TGCTGGGTCA GACCTGGAGG CTCACCTGAA 174 0 

TTGGAGCCCC TCTGTACCTA TCTGGGCAAC AAAGAAACCT ACCATGAGGC TGGGGCACAA 1800 

2 5 TGAGCTCCCA CAACCACAGC TTTGGTCCAC ATGATGGTCA CACTTGGATA TACCCCAGTG 1860 

TGGGTAAGGT TGGGGTATTG CAGGGCCTCC CAACAATCTC TTTAAATAAA TAAAGGAGTT 192 0 



30 



GTTCAGGTAA 



(2) INFORMATION FOR SEQ ID NO: 23: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 560 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

10 TCCAGGCAGC GGTCGGGGGA CAACCTCGTG TGCCACGCCC GTGACGGCAG CATCCTGGCT 6 0 

GGCTCCTGCC TCTATGTTGG CCTGCCCCCA GAGAAACCCG TCAACATCAG CTGCTGGTCC 12 0 

AAGAACATGA AGGACTTGAC CTGCCGCTGG ACGCCAGGGG CCCACGGGGA GACCTTCCTC 18 0 

15 

CACACCAACT ACTCCCTCAA GTACAAGCTT AGGTGGTATG GCCAGGACAA CACATGTGAG 24 0 

GAGTACCACA CAGTGGGGCC CCACTCCTGC CACATCCCCA AGGACCTGGC TCTCTTTACG 300 

20 CCCTATGAGA TCTGGGTGGA GGCCACCAAC CGCCTGGGCT CTGCCCGCTC CGATGTACTC 360 

ACGCTGGATA TCCTGGATGT GGTGACCACG GACCCCCCGC CCGACGTGCA CGTGAGCCGC 42 0 

GTCGGGGGCC TGGAGGACCA GCTGAGCGTG CGCTGGGTGT CGCCACCCGC CCTCAAGGAT 4 80 

TTCCTTTTTC AAGCCAAATA CCAGATCCGC TACCGAGTGG AGGACAGTGT GGAATGGAAG 54 0 

GTGGTGGACG ATGTGAGCAA 560 



25 



30 



(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 91 base pairs 
3 5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
5 (A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1053 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

10 

ACC CTC AAC GGG CGC CGC CTG CCC CCT GAG CTC TCC CGT GTA CTC AAC 4 8 

Thr Leu Asn Gly Arg Arg Leu Pro Pro Glu Leu Ser Arg Val Leu Asn 
15 10 15 

15 GCC TCC ACC TTG GCT CTG GCC CTG GCC AAC CTC AAT GGG TCC AGG CAG 96 

Ala Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin 
20 25 30 

CGG TCG GGG GAC AAC CTC GTG TGC CAC GCC CGT GAC GGC AGC ATC CTG 144 

2 0 Arg Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu 

35 40 45 

GCT GGC TCC TGC CTC TAT GTT GGC CTG CCC CCA GAG AAA CCC GTC AAC 192 
Ala Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Val Asn 
25 50 55 60 

ATC AGC TGC TGG TCC AAG AAC ATG AAG GAC TTG ACC TGC CGC TGG ACG 24 0 

lie Ser Cys Trp Ser Lys Asn Met Lys Asp Leu Thr Cys Arg Trp Thr 
65 70 75 80 

30 

CCA GGG GCC CAC GGG GAG ACC TTC CTC CAC ACC AAC TAG TCC CTC AAG 28 8 

Pro Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys 
85 90 95 

3 5 TAG AAG CTT AGG TGG TAT GGC CAG GAC AAC ACA TGT GAG GAG TAC CAC 336 

Tyr Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His 
100 105 110 
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ACA GTG GGG CCC CAC TCC TGC CAC ATC CCC AAG GAC CTG GCT CTC TTT 3 84 

Thr Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe 
115 120 125 

5 ACG CCC TAT GAG ATC TGG GTG GAG GCC ACC AAC CGC CTG GGC TCT GCC 43 2 

Thr Pro Tyr Glu He Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala 
130 135 140 

CGC TCC GAT GTA CTC ACG CTG GAT ATC CTG GAT GTG GTG ACC ACG GAC 4 80 

10 Arg Ser Asp Val Leu Thr Leu Asp He Leu Asp Val Val Thr Thr Asp 

145 150 155 160 

CCC CCG CCC GAC GTG CAC GTG AGC CGC GTC GGG GGC CTG GAG GAC CAG 52 8 

Pro Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin 
15 165 170 175 

CTG AGC GTG CGC TGG GTG TCG CCA CCC GCC CTC AAG GAT TTC CTC TTT 576 

Leu Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe 

180 165 190 

20 

CAA GCC AAA TAG CAG ATC CGC TAC CGA GTG GAG GAC AGT GTG GAC TGG 624 

Gin Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp 
195 200 205 

2 5 AAG GTG GTG GAC GAT GTG AGC AAC CAG ACC TCC TGC CGC CTG GCC GGC 67 2 

Lys Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly 
210 215 220 

CTG AAA CCC GGC ACC GTG TAC TTC GTG CAA GTG CGC TGC AAC CCC TTT 72 0 

3 0 Leu Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe 

225 230 235 240 

GGC ATC TAT GGC TCC AAG AAA GCC GGG ATC TGG AGT GAG TGG AGC CAC 768 
Gly lie Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His 
35 245 250 255 
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CCC ACA GCC GCC TCC ACT CCC CGC AGT GAG CGC CCG GGC CCG GGC GGC 816 

Pro Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly 
260 265 270 

5 GGG GCG TGC GAA CCG CGG GGC GGA GAG CCG AGC TCG GGG CCG GTG CGG 864 

Gly Ala Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg 
275 280 285 

CGC GAG CTC AAG CAG TTC CTG GGC TGG CTC AAG AAG CAC GCG TAC TGC 912 

. 0 Arg Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys 

290 295 300 

TCC AAC CTC AGC TTC CGC CTC TAC GAC CAG TGG CGA GCC TGG ATG CAG 960 

Ser Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin 

5 305 310 315 320 

AAG TCG CAC AAG ACC CGC AAC CAG CAC AGG ACG AGG GGA TCC TGC CCT 1008 

Lys Ser His Lys Thr Arg Asn Gin His Arg Thr Arg Gly Ser Cys Pro 
325 330 335 



0 



0 



CGG GCA GAC GGG GCA CGG CGA GAG GTC CTG CCA GAT AAG CTG TAGGGGCTCA 1060 
Arg Ala Asp Gly Ala Arg Arg Glu Val Leu Pro Asp Lys Leu 
340 345 350 

GGCCACCCTC CCTGCCACGT GGAGACGCAG AGGCCGAACC CAAACTGGGG CCACCTCTGT 112 0 

ACCCTCACTT CAGGGCACCT GAGCCCCTCA GCAGGAGCTG GGGTGGCCCC TGAGCTCCAA 1180 

CGGCCATAAC AGCTCTGACT CCCACGTGAG GCCACCTTTG GGTGCACCCC AGTGGGTGTG 124 0 

TGTGTGTGTG TGAGGGTTGG TTGAGTTGCC TAGAACCCCT GCCAGGGCTG GGGGTGAGAA 13 00 

GGGGAGTCAT TACTCCCCAT TACCTAGGGC CCCTCCAAAA GAGTCCTTTT AAATAAATGA 13 60 

GCTATTTAGG TGCAAAAAAA AAAAAAAAAA A 13 91 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 50 amino acids 
5 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Thr Leu Asn Gly Arg Arg Leu Pro Pro Glu Leu Ser Arg Val Leu Asn 
1 5 10 15 

15 Ala Ser Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin 

20 25 30 

Arg Ser Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu 
35 40 45 

20 

Ala Gly Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Val Asn 
50 55 60 

lie Ser Cys Trp Ser Lys Asn Met Lys Asp Leu Thr Cys Arg Trp Thr 
25 65 70 75 80 

Pro Gly Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys 
85 90 95 

3 0 Tyr Lys Leu Arg Trp Tyr Gly Gin Asp Asn Thr Cys Glu Glu Tyr His 

100 105 110 

Thr Val Gly Pro His Ser Cys His lie Pro Lys Asp Leu Ala Leu Phe 
lis 120 125 

35 

Thr Pro Tyr Glu lie Trp Val Glu Ala Thr Asn Arg Leu Gly Ser Ala 
130 135 140 
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Arg Ser Asp Val Leu- Thr Leu Asp He Leu Asp Val Val Thr Thr Asp 
145 150 155 160 

Pro Pro Pro Asp Val His Val Ser Arg Val Gly Gly Leu Glu Asp Gin 
5 165 170 175 

Leu Ser Val Arg Trp Val Ser Pro Pro Ala Leu Lys Asp Phe Leu Phe 
180 185 190 

10 Gin Ala Lys Tyr Gin He Arg Tyr Arg Val Glu Asp Ser Val Asp Trp 

195 200 205 

Lys Val Val Asp Asp Val Ser Asn Gin Thr Ser Cys Arg Leu Ala Gly 
210 215 220 

15 

Leu Lys Pro Gly Thr Val Tyr Phe Val Gin Val Arg Cys Asn Pro Phe 
225 230 235 240 

Gly He Tyr Gly Ser Lys Lys Ala Gly He Trp Ser Glu Trp Ser His 
20 245 250 255 

Pro Thr Ala Ala Ser Thr Pro Arg Ser Glu Arg Pro Gly Pro Gly Gly 
260 265 270 

2 5 Gly Ala Cys Glu Pro Arg Gly Gly Glu Pro Ser Ser Gly Pro Val Arg 

275 280 285 

Arg Glu Leu Lys Gin Phe Leu Gly Trp Leu Lys Lys His Ala Tyr Cys 
290 295 300 

30 

Ser Asn Leu Ser Phe Arg Leu Tyr Asp Gin Trp Arg Ala Trp Met Gin 
305 310 315 320 

Lys Ser His Lys Thr Arg Asn Gin His Arg Thr Arg Gly Ser Cys Pro 
35 325 330 335 
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Arg Ala Asp Gly Ala Arg Arg Glu Val Leu Pro Asp Lys Leu 
340 345 350 



5 (2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26 



TCCAGGCAGC GGTCGGGGGA CAAC 24 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

2 5 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 0 (ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
3 5 TTGCTCACAT CGTCCACCAC CTTC 24 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6663 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

15 CCCAGAACTC TTGGACGCTG AGGCAGGAGG ATTCCCAAGT TTCAAGACAG TGTGTTTCTA 6 0 

GGTAATGAGA CCCTGTCAAG AAAAGAAAAG AAATAAAGAG ACAAGAAAAT GTTTATAGGC 120, 

TGTGAGACAG CTTGGTGGGT AAGGGGCACT TGCCTCCAAT CAAGATGACC TCAGCCCCAT 180 

20 

CCCTAGGAAT CCATGGTAGA AGGAGAAAGC AAACTCGCAG CTGCTGACCT CCATACATGT 24 0 

GCTCCAATGT GCACACACAC AGGGAGACAT AATCAATTAA TAGGATGTAT TTGCTTAGAT 3 00 

25 TTGAGTAGGC ATTTATGACT GATGTTTTAA AATTTTTATT TGATTTTATG AAAATATACC 36 0 

TGTTTGTATT TGGTTTGGTT TGGTTTGAGT TTTGTTTATT TGAGACAGGG CTTCTCTGTG 42 0 

TAGTCCTGGC TGTCCTTGGA ACTCACTCTG TAGACCAGGC TGGCCTTGAA CTCAGAAATC 480 

30 

CGCCTGCTTG TGCTTCCCAA GTGCTTAGAT TAAAGGTGTG CACTGCCATT CAGCAAAATT 54 0 

GCATACTTTA ACCCCAGTAT TTGGGAGGCA GAGGCAGACT AATGTGTGAA TTCCAGGCTA 600 

3 5 GCCAAGGATA CAGAGTGAGA CCCTATTCTT ACCCTCCCCC CCCAAAACCC CAAAATGTAT 660 

TTTGTGCTTG TGTATGTACA TGTGTGTTGC AGCACGTAAA TGTCCAAGGA CAACTTGTAG 72 0 
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AAGTTCTCTC CGTTCACAGT CTAAGTCCTG AATTCAAACT AAGGTCCTCA GGCTTAGCCA 7 80 

CAGTCTTCTT TATGTACTGA GCCATTTCAC TGGCCCTGGA TTGACTGATG AATTAATTTT 84 0 

5 TGAGATAAGG TCTCTTGTAG CTCTAGCTAG GCTCAAACTA TGAACTCCCA AGGTCATCTT 90 0 

GAGCTGCTGG TACTCTTGCT TCCACCCCAA GTGGTGGAAT GATACTCAGG CAGCACTTCT 9 60 

CTGGGGAAGG GGCTGGCCTT GGCCTTGATT TTGTTGCCTC AGCTTCAATG AGTGCTTGGG 102 0 

TCTCGTTGTT TCTTTTCTTT ATCTGTGAAA TGGGTGAACA CCTGTTCAAG ACTTCCTGAC 10 8 0 

TCTTGAAACA TCCAGGCAGG GTGAGGGACT TGAAGTGGGC TCATCCCATG CCTAACAAAG 114 0 

15 TGTCGTCTTT GACCCCAGAC ACAGCTGTAA TCAGCCCCCA GGACCCCACC CTTCTCATCG 12 0 0 

GCTCCTCCCT GCAAGCTACC TGCTCTATAC ATGGAGACAC ACCTGGGGCC ACCGCTGAGG 12 60 

GGCTCTACTG GACCTTCAAT GGTCGCCGCC TGCCCTCTGA GCTGTCCCGC CTCCTTAACA 1320 

20 

CCTCCACCCT GGCCCTGGCC CTGGCTAACC TTAATGGGTC CAGGCAGCAG TCAGGAGACA 13 80 
ATCTGGTGTG TCACGCCCGA GACGGCAGCA TTCTGGCTGG CTCCTGCCTC TATGTTGGCT . 1440 

2 5 GTAAGTGGGG CCCCAGACAC TCAGAGATAG ATGGGGGTTG GCAATGACAG ATTTAGAGCC 150 0 

TGGGTCTTCT GTCCTGGGGC AGAGCCATGG GCTCTCACTT GCATGCAGGC ATGGTCATAC 1560 

CCAGCACAGG CATTGCAACT CTAGGGACAG CTGTGGCTGC ACTGTCCCCT GTGTACCCCA 162 0 

30 

CAGCTTTAGA AAAGCTGTCA TGTTTTCCTT GTAGTGCCCC CTGAGAAGCC CTTTAACATC 16 8 0 

AGCTGCTGGT CCCGGAACAT GAAGGATCTC ACGTGCCGCT GGACACCGGG TGCACACGGG 174 0 

3 5 GAGACATTCT TACATACCAA CTACTCCCTC AAGTACAAGC TGAGGTTGGT ACCCAGCCAA 18 0 0 

GCCTTGCTGT GTGACTTCTG GCAATACTTA CCTTCTCTGA TCAAATATGT TCCTGTTTAT 1860 
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GAACTCAAAA GGGACTCTCG CACCTCCACA GGTGGTACGG TCAGGATAAC ACATGTGAGG 192 0 

AGTACCACAC TGTGGGCCCT CACTCATGCC ATATCCCCAA GGACCTGGCC CTCTTCACTC 19 80 

5 CCTATGAGAT CTGGGTGGAA GCCACCAATC GCCTAGGCTC AGCAAGATCT GATGTCCTCA 2 04 0 

CACTGGATGT CCTGGACGTG GGTGAGCCCC CAGTGTCCAC CTGTGTTCTG CCCTAGACCT 2100 

TATAGGGCGC CTCCCCCCCA TCCCCCCAGA CTTTTTGGTT CTTCTAGAGG TCTTAGCCAC 216 0 

AGCCACGGTG GTTGCAGGAC AGTGGTTGTT CATAACTTAA TGCAAAGACT TTCCCCCAAG 222 0 

ACAGTCAAGA TTTTTCCCCT CCCCACCCCC AACACACACA TACACACACA CTCTGCAGAG 22 80 

15 AACACCTGGC CTGACCACCC TCCCTCTCTA CAGCCCAGGT GTTCAGAAGG GAGTCCTAGG 2 34 0 

GGACTGAGAG GAGGCGCCCA GGTCTGAAGG CGCCCCAGGA AGCCGAGGCC TTGAGCTGGG 2 4 00 

GGGGGGGGCG AGGGTTGGAG GCACGAACTG GATGATCCCT GAGCACAACT GGGCCTAATC 24 60 

20 

TAATTAGGGT GTTCCCAGCC CAAAGCAGCC TGGGCCATTT AACCCTTCAA GTGCCTCACT 2 52 0 

GAAGACTCAG GGGAGAGATC AGCTTGTACT CTCTCCATGG TCCCCCAGGA GGGTTCCTGG 2 58 0 

2 5 GTGCCCCTGG CTCATTCCCA CATCCAGAGG TTTTGTGTCT TCCTGGCATC TAACCCTCAG 2 64 0 

TTGTGCTCTG TGGCTGGCAC AGCTGCCCCG TGGAGGCTCT TGGTAATGTA CAAGGCATCA 2 70 0 

GAGGTGGACA TGGGATGGGG ATACATAGGG ATGGAGCCAA ATAGCACCTC AAGGTGGGGT 276 0 

30 

GATATACAAT AAAGCTTGTC ACCCTGACGC TCAGAAAGCC TACTCATGAT GATCACAATT 2 82 0 

GTTGACATCA CTCTGGGACA TGTAGTGAGA CCCTAGCTCA AAACACAGAC AGTAGCTTTA 2 8 80 

3 5 AGAGTCAGCT TGTGACTTAA TACTGGAACT CAGGGCCTAA TAGGTGCTGG GTGATGCTCG 2 94 0 

CCTCACTCCC TGTTTAGTGA GATCTCTGCG CTAATCTCCA CCCCAGCTGG GTGGGCTGCT 3 000 
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CTGTCCCCTT GAGGGCAGGA ATGTGTGTCT TCCATCAGAG ATAGGACCCG TGGTAGCAGC 3 06 0 

AACTGCTGCT GGCTGTTTCT GGAATATTAA ATGACAGTAA TCTATCAGGC CTGGGTGAGT 312 0 

5 AGCTAACAGG GGTGGGGGCG TGGTCTGGAA AACGCAGATA GGGTCATAGG AGCCACTGCA 3180 

GCCTAGATTA CACCACTGGG TGTTCTGTCA CTAGGCCATT CTCACCAAGC AGTCCTCAGA 324 0 

ACTGGGAGCA CTGTTGCCAG CATTTAATGC CAGCATTTAA TGCCAGCATT AGGGGAGGCA 3 3 00 

GAGGCAGAAG GATCTCTCTG AGTTCAAGGC CATCCTGAAT TTACATAAAG AGCTCCAGGC 3 3 60 

CAGCCAGGGT GCGCAGTAAA ACCTTGTCTC AAAAAACAAA GCATCTTTAG TGACCAGGCT 342 0 

15 TGCTCCACCC CCAGTGACCA CGGACCCCCC ACCCGACGTG CACGTGAGCC GCGTTGGGGG 3 4 80 

CCTGGAGGAC CAGCTGAGTG TGCGCTGGGT CTCACCACCA GCTCTCAAGG ATTTCCTCTT 3 54 0 

CCAAGCCAAG TACCAGATCC GCTACCGCGT GGAGGACAGC GTGGACTGGA AGGTGCCCGT 3 60 0 

20 

CCCGCCCCGG ACCCGCCCCT GACCCCGCCC CCCGCATCTG ACTCCTCCCT CACCGTGCAG 3 660 

GTGGTGGATG ACGTCAGCAA CCAGACCTCC TGCCGTCTCG CGGGCCTGAA GCCCGGCACC .372 0 

2 5 GTTTACTTCG TCCAAGTGCG TTGTAACCCA TTCGGGATCT ATGGGTCGAA AAAGGCGGGA ,3780 

ATCTGGAGCG AGTGGAGCCA CCCCACCGCT GCCTCCACCC CTCGAAGTGG TGAGCACCTC 3 84 0 

TCCAGGGCTG GCTGGCCCAT GGAATCCCCA ATCCATCCTG TTCCTTCCCC CCCACCCTTT 3 900 

30 

TTTTGAGACA GCGTCTTCAG GTAGCGCATG CTGGCCTTAA ATTCAGTATG TAGTCAAGGA 3 960 

TGACCTCGAG CTCCTGGTCT TTTTGTCTCC ACTTAGAGAC AATGGCCAGT GGCCATCACC 4 02 0 

3 5 ACCTTTGGGA GACTAGCCAT GGAGTCTATT TAGCCTGTCA TTTGGTGACA GATGGAGTAC 4 080 

AACAGTGTGA CCTCTTGTAA GAGAACTGAA GACAGGCTGT TTTTAACCCC AATATCCTAG 414 0 
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GCTCTCTAGA GGTTAACTTT ATATAAAATA GAGACTATTA CAGCCAGTTA TCACATGGTC 4 2 00 

CCACAGAACC TTTTGTCACA CAACCTATAG ACCACAGTGC CTGTGCCTAC CACATAAGGG 42 60 

5 TCTCTACTGC TGGCCCACCC CTCCAACCCT TAAAAGGTAA CCTAGGCAGC CTTAATATTT 4320 

GCAATCCTCC TACCTCAGCC TCTTGAATGC TCAGAAACCA GGCATTAACC CAAGTTTCTC 43 80 

TTCTCTGGGT CCCTTTCTTA AGGTGGGAGG GCCTAAAGAT GACTTCCTTT GTCCTGAAGA 444 0 

CTCTCCGAGC CCATGGATCT GCACTCTCTA ATATGAAATA TATTGCATAA AATGTCTGGC 4 5 00 

CTCAGTTTCC CCACCTGTCA GGTTTAGGCA GCACAGTCGG TCCAAGACAC TTCATTATTT 4 560 

15 GCAGGCAGTA TAAGAAGAAG CTCCCATCCC CCACCCGCTT CCTCCGGTCC CTAAGACAGA 4 62 0 

ATACTTCTAC ACTGAAACTG AACTCTCGCA GACGCATATG CTCACTTTAA TGATGATGAA 4 6 80 

ATAATGGGGA AACTGAGGCT CCGAGAGATT CCTGGAGGAA GAGGGTCAAA ACCAGCTCCA 4 74 0 

20 

GGAAGCTCTC CAGCCCCCAT CCGGGCCTCT CCAGGTTCTG GGCTTGGCGG GAGTGAACAC 4 800 

AGGTGGGAGG GGCTGGAGCC TGGGAGCTTT GGCCCTTGCT CGTGCCCAGC ACCTGCGATT 4 860 

2 5 CTTGCACGGG AGCCAGCAGG CGGCTGCGTC CGCCCGAGAG ACTGAAGAAG CCGGGGGTAG 4 920 

GGTTGGAGGG AGGTAAGCAG GGGCTGTGGG GGCCGAAGCT TGTGCCAGGG CCTGTCAGCG 4 98 0 

AGTCCCCAGT TTTATTTATG GCGTGAGGCC GATGTCCTTA TCCGCTGGCC TGCTGGGGGA 504 0 

30 

TGGCTGCGGC TGGGGATTGG ACCCAAGGGC TGGCTTCCCA CTCAGTCCTC CAGCCCACTC 5100 

CATGTCACAC CCGTGCATTC TCTGAGGCTT ATCTTGGGAA CCCGCCCTTG TTCTGTGCTG 516 0 

3 5 TCTGTCTCTA TTTCTGTCAT TCACTTTCCC AGAGCCTTTT TTTTATGCTT TTAATATAAC 522 0 

TACGTTTTAA AAATTGCTTT TGTATAATGT GTGTGCCTTC GTGAGCGTGC GTGCCACAAC 52 8 0 
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ACACACGTGA AGGTTAGAGA ACTTTGTTGA GTAGGCTCCT TCCACCATGT GGGACTAGGG 53 4 0 

CTGGCGACAA GAGCAATTAC TGAGTCATCT CGCCAGCCCC TCACCCCTCA CTTCCCATCC 54 0 0 

5 TGTTTGGATA GTCATAGGTA ATCGAAGGTA AATCGCTGGC TTTAATTTCG TAGCTATCCT 54 6 0 

GCCTCAGCCT ACCAAGTGCT GTGCTACCAC GTTTGTGGGA GGGGCTCTCC TCCCAGTGTC 5 520 

TGGGGGTGAC ACAGTCCCAA GATCTCTGCT TTCTAGGTCT TTGTCTTAGT TTGCCCCTTG 5 5 80 

CTTTGTCCGT GTCCCTAGAG TCTCCGGCCC CACTTATCCA TTGACTGGTC TTTCCTTTAC 564 0 

CGAATACTCG GTTTTACCTC CCACTGATTT GACTCCCTCC TTTGCTTGTC TCCATCGCCG 5 700 

15 TGGCATTGCC ATTCCTCTGG GTGACTCTGG GTCCACACCT GACACCTTTC CCAACTTTCC 5 76 0 

CCAGCCGAAG CTGGTCTGGT ATGGGAGGCC GCCGTCCCGC GCGCGCCTCC TGCTGGCCGC 5 82 0 

GCCCCAACAC TGCCGCTCCA TTCTCTTTAG AGCGCCCGGG CCCGGGCGGC GGGGTGTGCG 5 880 

20 

AGCCGCGGGG CGGCGAGCCC AGCTCGGGCC CGGTGCGGCG CGAGCTCAAG CAGTTCCTCG 5 940 

GCTGGCTCAA GAAGCACGCA TACTGCTCGA ACCTTAGTTT CCGCCTGTAC GACCAGTGGC .6000 

25 GTGCTTGGAT GCAGAAGTCA CACAAGACCC GAAACCAGGT AGGAAAGTTG GGGGAGGCTT .6060 

GCGTGGGGGG TAAAGGAGCA GAGGAAGAGA GAGACCCGGG TGAGCAGCCT CCACAACACC 6120 

GCACTCTTCT TTCCAAGCAC AGGACGAGGG GATCCTGCCC TCGGGCAGAC GGGGTGCGGC 6180 

30 

■ GAGAGGTAAG GGGGTCTGGG TGAGTGGGGC CTACAGCAGT CTAGATGAGG CCCTTTCCCC 62 4 0 

TCCTTCGGTG TTGCTCAAAG GGATCTCTTA GTGCTCATTT CACCCACTGC AAAGAGCCCC 630 0 

3 5 AGGTTTTACT GCATCATCAA GTTGCTGAAG GGTCCAGGCT TAATGTGGCC TCTTTTCTGC 6360 

CCTCAGGTCC TGCCGGCTAA ACTCTAAGGA TAGGCCATCC TCCTGCTGGG TCAGACCTGG 64 2 0 
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AGGCTCACCT GAATTGGAGC CCCTCTGTAC CATCTGGGCA ACAAAGAAAC CTACCAGAGG 64 8 0 

CTGGGCACAA TGAGCTCCCA CAACCACAGC TTTGGTCCAC ATGATGGTCA CACTTGGATA 654 0 

5 TACCCCAGTG TGGGTAGGGT TGGGGTATTG CAGGGCCTCC CAAGAGTCTC TTTAAATAAA 6600 

TAAAGGAGTT GTTCAGGTCC CGATGGCCAG TGTGTTTGGG GCCTATGTGC TGGGGTGGGG 66 60 



10 



15 



20 



GGA 6663 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 186 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

2 5 Asp Pro Thr Leu Leu lie Gly Ser Ser Leu Gin Ala Thr Cys Ser lie 

15 10 15 

His Gly Asp Thr Pro Gly Ala Thr Ala Glu Gly Leu Tyr Trp Thr Phe 
20 25 30 

30 

Asn Gly Arg Arg Leu Pro Ser Glu Leu Ser Arg Leu Leu Asn Thr Ser 
35 40 45 

Thr Leu Ala Leu Ala Leu Ala Asn Leu Asn Gly Ser Arg Gin Gin Ser 
35 50 55 60 
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Gly Asp Asn Leu Val Cys His Ala Arg Asp Gly Ser lie Leu Ala Gly 
65 70 75 80 

Ser Cys Leu Tyr Val Gly Leu Pro Pro Glu Lys Pro Phe Asn lie Ser 
5 85 90 95 

Cys Trp Ser Arg Asn Met Lys Asp Leu Thr Cys Arg Trp Thr Pro Gly 
100 105 110 

10 Ala His Gly Glu Thr Phe Leu His Thr Asn Tyr Ser Leu Lys Tyr Lys 

115 120 125 

Leu Arg Leu Val Arg Ser Gly * His Met * Gly Val Pro His Cys 
130 135 140 

15 

Gly Pro Ser Leu Met Pro Tyr Pro Gin Gly Pro Gly Pro Leu His Ser 
145 150 155 160 

Leu * Asp Leu Gly Gly Ser His Gin Ser Pro Arg Leu Ser Lys lie 
20 165 170 175 

* Cys Pro His Thr Gly Cys Pro Gly Arg 
180 185 



25 



35 



(2) INFORMATION FOR SEQ ID NO: 30 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 5 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
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AGCTGGCGCG CCTCCCGGGC GGATCGGGAG CCCAC 3 5 



5 (2) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 



2 0 AGCTACGCGT TTAGAGTTTA GCCGGCAG 2 8 

(2) INFORMATION FOR SEQ ID NO: 32: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

35 

Met Val Leu Ala Ser Ser Thr Thr Ser lie His Thr Met Leu Leu Leu 
15 10 15 
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Leu Leu Met Leu Phe His Leu Gly Leu Gin Ala Ser lie Ser 
20 25 30 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 amino acids 
10 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 
{ D ) TO POLOGY : linear 



(ii) MOLECULE TYPE: DNA 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 



20 



25 



lie Lys Pro Ser Gly Arg Arg Gly Ala Ala Arg Gly Pro Ala Gly Asp Tyr Lys Asp Asp 
S 10 15 20 

Asp Asp Lys 



(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

35 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



5 GATCTTGCCC TCGGGCAGAC GGGGTGCGGC GAGAGGTCCT GCCGGCGACT ACAAGGACGA 60 

CGATGACAAG TAG 7 3 

10 (2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



25 



30 



AACGGGAGCC CGTCTGCCCC ACGCCGCTCT CCAGGACGGC CGCTGATGTT CCTGCTGCTA 60 
CTGTTCATCC TAG 7 3 

(2) INFORMATION FOR SEQ ID NO: 36: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
3 5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

CCCACGCTTC TCATCGGATT CTCCCTG 2 7 



10 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

25 

CAGTCCACAC TGTCCTCCAC TCGGTAG 27 



3 0 (2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 

GCGGCCGCTG CAGTGATTAC TCACCGCGTG GCGCACCCCA CCCGCGGGCC GCTGAGTGGA 60 

TTTTTCCGTG GGGGGATGTG AAGAAGTTTA GGGAGAACTC TTCTGCACCG ATGGGAACTA 12 0 

GGAATGCAGG GTTCGGTCCC GTTCCCCAAA GGACACACCT CTCCCCATAA GCCCACTCAT 18 0 

AAGGGCTCCC TGCACGCGCT CCGGGACATC CCCATATCCA ATACCCGCAG ATATGATAGT 2 40 

TGAGAAGGGA CCAGAGGCCG GAGACTCCCT CCCTGCCTTC TGGCTTTCCC CCCCCCCTGC 3 00 

ACGAAACGAG ACTACAGCGA TGGGAGAGGT GGCATGAAGG CTTAGGGTGG GGATCGGTAG 3 60 

15 GACCCATGCA CCCAGAGAAA GGGACTGGTG GCAACTTTCA AACTCTCTGG GGAAGGAAGA 420 

AGGGCTGAAA GAGGATGAAC GGGCTCAGGT ACTGCTCAAT GTGTGTGTGG CGGACCAAAG 4 80 

TGGGTATGGG GGCCCCGTAA GAGGGGCGGG GAAGGTGGAT AGGAAGGATC CCGGTAGACT 54 0 

20 

GGAGGGGATC CTGGAAAAGC ACCAGGGCTG CGAGCTAGGA ACCCATTCGG AGTTAAGGGT 60 0 

ACAGGATCCC AGATGAGGGG GTGGGAAGCC TGGGACGGGC GGGACCAGAG AGGGAGGTCC 660 

2 5 CACGGGCTGG TGGGGAAAGA GTGGGGGGCT TCGCGCAGGA GGATGGGACG TTCAGGAGTG 72 0 

GTAACTGGGC GGAGGCCGGC CGGGCGGGGC GCGCGGTGCC CGCGGGCGGT GGGAAGGCCG 7 80 

GTGCGGGGCC CACGATCAAC CCCCCCCCAG GGGCCGGGCC GGGCCGGGGG CGGGGCCGGG 84 0 

30 

CGGGGCGAGC GGCGCATTAG CGCCTTGTCA ATTTCGGCTG CTCAGACTTG CTCCGGCCTT 90 0 

CGCTGTCCGC GCCCAGTGAC GCGCGTGAGG ACCCGAGCCC CAATCTGCAC CCCGCAGACT 960 

3 5 CGCCCCCGCC CCATACCGGC GTTGCAGTCA CCGCCCGTTG CGCGCCACCC CCATGCCCGC 1020 

GGGTCGCCCG GGCCCCGTCG CCCAATCCGC GCGGCGGCCG CCGCGGCCGC TGTCCTCGCT 1080 
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GTGGTCGCCT CTGTTGCTCT. GTGTCCTCGG GGTGCCTCGG GGCGGATCGG GAGCCCGTGA 114 0 

GTACCGTGCG CCCTGCTCCC CACCTCCCCA GGGAAGCCGG GATCCGGCGC CCCGGGGGGT 12 00 

AGTCGCGGGG GATGGAAGAA GGGGCGCGAG CGCCACCTGG ACGTCCCGGG AACAAAGGAA 1260 

GGCGGCCCTC GGGGCGCCCT CACCTGTGGG GCTCATGGCA CCACCACCCA GCCTCCCAAG 132 0 

AGTACCCCGT TATACATCAG AGGCCTCTTA TCTGTATCCC CTTTGCGAGG CTGTCTGGCC 13 80 

AGGCTCAGTT TGAAGGACAT CGCAGTGTCC TGGGACCCCC CTCCTTCAGG GTGCTGGGAC 144 0 

GCTTCGGGGC GCACGCCTGT GTCTTGGATA TCAGAGCGGA AGGGAAGCCT CCCTGGCCGG 150 0 

15 GGGCGCACGC TTGGGTGCGT TGGGTTGGGT GCTGGCGCAA AGTGGGGTCC CCTCCCCCAT 15 60 

GAAGTGATGA TCCCCGGGGG GAGGGTGGGG CGTTATCGTG AGCCCTCCTG TCCGCCTGGC 162 0 

ATGCGGCCCG GCGTCCCTCG GGACTTGCCT CTCCGTGGGG TCGGCGCCGC CCCCTCCCCC 168 0 

20 

CTATAGCAGA CTCCATGCTT TGGTATCCTC GAAGTCCTCT CCACTGGTGG GGCTCACAAC 174 0 

CGGTCTCATT CAGGCTGCGC TGGGTTGAGA GCCTCTAGCG ACTGAAATTT CGGTGAGGAG . 1800 

2 5 CGAGAGCAAG CGTGTCCGGG CACCGCGAGC CCAGACTTCA TTGTCTAAGG GGCACCCAGT 1860 

GGGGGTCAGC TGCCGAGAGA ATCCCACTGT CCCAGGAGGA ACTCCTGGCC TTGAGCCCCC 1920 

ATCACCCAAC GCACACATCC CCGCCAGGAT GCGGTCTCCA CATCCAGACC CTCTCTGGGA 1980 

30 

CACACCCAAA GACACACAAA AGAGCCCCAC TGGCTTATGT CCCGTCACCC TGCCCTCCGA 2 04 0 

CGCGCGCTGC AGCCCAGATG CGTATTCGCA CACCATCGCG GCGCTCGCAT TCCATCCTCT 2100 

3 5 ACACACACAC ACACACACAC ACACACACAC ACACACACAC ACACACAGAC ACGCACACAC 2160 

ACACGCACGC ACACACACGC ACGCCCGCAC TCGTGGTCCC ACATTTATTT CACAGGGGAG 2220 
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GCAACACCGG GGTACGCATA TGGTTGAGTG CACTGGAGAT CTTTCCCCAC CACTCTCAGG 22 80 

ACCCCATCCG GAGACACAGG CCACACCGCA GGGGCACCAC GCTGCGCTGC TGCTCTGGGC 2 34 0 

5 TAGTAGTCTT GTGCAGTTTG TCCGCGGTGT CTGTGGACGC CCTCCCGCTC TTGTCAGGGG 24 0 0 

ACAGGAACCT ACACTCCTGC TTGCCCAAGG CGGCTGGGCA GGTGATGTGG TGACACCCGG 24 6 0 

GACCTTTCCG GGGAGTTGGT GTTGCTGCCA AGCCTGGGTA GTTTTTGAAT GCCACCAATA 2 52 0 

GCGCTAAGCT TTGTTTCCGG GCGGGCTGCA GAGCAACAGG CGAAGGTGGC GGAGTGGGGG 25 80 

TGGCGCGTGT GTTTTTTCTT TTAAGGGGGA GAGAAATTAA ATAAGAGGTT CTCACACCTC 264 0 

15 TGCAATCTGT TTGTACTTAC CGTGTGTCTT AACACCTGAC CAGCCAGCCG GTGGGTCGTA 27 00 

AAAGTGTATG CAGGTACCAG CGGGACAGGA GATGGGGGCC CCTGGGGTAT GGCTGGGATG 27 60 

GAGGCCACCT TCCCGTTGGC CTTTCAGGGA ATCTCACACT TTTCCCTTTT AAAACACATG 2 8 20 

20 

GTGTTCTTTT TAATAACGGC AGCAACTCCG CATTGGGAAA GGGGGAAATA AGCTTGTATA 2 880 

GGCCCCGGCT TTGTGGAAAG GAGGGGAAGA GGGAAGAAAA AAGGAGGGGT GTCTCCTCCA 2 94 0 

2 5 GGCTTAGGGG GCTGTCAGCT GCTGCTCTGT CTAGCTTGGC ATGTGTGTGC CCCAGTCCCC 3 0 00 

AGTGGCTTTG GCCCATTGTT TGTGGAAGCC AAGAGGGAGA CTGGAGTCCT CTATCTCTGG 3 06 0 

TACTCCAGAG TCAGGCTTCT CAGTCCGAGC CCAGAGAACG TCTTCCCTGT TTTATGGAGG 312 0 

30 

GAATCAGGGA AGGGGGTGCC AGGTGGACTA CGTTCTGCTG AGGACTGTAC CAGTCGCTCG 3180 

AAGGAGAAAG CTTGGGCTTG CCCCCCTCCC CCCTCAAGCC ACGAAGGGCA GCTGCTAGGC 3 24 0 

3 5 TAGTGTGGTA AAAGGGCATT ACTCCCCAGC CAGGACCCCC CAGAGAGTCC CCTTCCTGGC 33 00 

CAGACAAATG CTGGGGAGGG ACAGAGGGGT GTGATCATTG CCCAGGAGTG CAGACAGTGG 3 3 60 
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GGTCCCGGGT CGGGCAGTGC CTCCCACCCT GCTGAGGGGG GCGCCCAGGC AGGAAGCGGT 34 20 

GGGTGGGCCG GGGTAGAGAC GCTGGCACGT CCCAGTTCAT GCCGAAGGAA TTCTGAATTA 34 80 

5 GCGGGCGGCT GGCTGCCTGG GACCTCCGGG GCGGCCCCCT GGCCCCCGCC GCTCCGTCTG 3 54 0 

. GCCTGCTCCT CCTGCTCCTT CGCACGGACG CTGAGACCTC CGCTGAGCCC TGGGACAAGC 3 600 

CCCAAATGCA ACTGCGATTG CAGGCTTCGC AAGACCCGCC TCCTCCCAAG GCCAAATTTG 3 6 60 

10 

CCTGGGAGAA GTCATTCAGG GCCCAGACTA .GAACCATGTT GGTGCCACCT CATCCATCTG 3 72 0 

GGGCATGAAG GACCGTCCAG GGCTGCAGTT TAGCTTCTTA ATAGGAACCT GGGGGTGGGT 3 7 80 

15 GCAGCCTCTG TTCTCCGAGC CTCTTTGGAA ATCGGTTTTG TTTTTGTTTT. TGTTTTTTCC 3 84 0 

AATACTCTTT TCCTCTCATC CCATCCCGGG ACTGTTTTCC TCCCTAAGGG TTGAGAGCCC 3 90 0 

TGCAGTCTTC CCTAACCTTT TCTTTGCTTC TACCCCAGGG CCTTTGCACA TGGAGTCCCA 3 960 

20 

CCTCTCCCCT TGCCCAACTG GGGCTCCAGC CTTACTGCAT TTGGCTCTTG GTAACTGTCC 4 020 
CAGGGCCTCT CTGACACACA GGGTTGTAGC CCCAGCTCCC TCTCTTCTCC TCCCCCCTTT . 4 080 

2 5 CTCTTTTGCT TCTGAGACTT AATTTTTTTC tTTTTCTTTT TGGCTTTTTG AGACAGGGTT 414 0 

TCTCTGTACA GCCCTGGCTG CCCTGGCACT CATTCTGTAG ACCAGGCTAG CCTCAAACTC 42 00 



ACAAACCTAC CTGCCTCTGC CTTTCCAGTG CTGGCACTAA AGATGTGGGC CACCACAACT .4260 

30 

AGTAGTTAAG TGTTTTGCTG TGTCTTTATT CCTATAGTGA CCTCAGTTCC TGGCATATTG 432 0 

TAGGCGATGG ATGGATGAAT GGATGGATGG ATGGATGGAT GGATGGTTGG ATGGAGCAAG 4 3 80 

3 5 CTTGAATCGT CCTGAGTGAA AAAAGAGACC TCAGAGAACT GAATGGAGTT AGGTTCCCAG 444 0 

GGCAGCCTGG CCTGCTGGTC TCATGGGAGC TCCCTGTGAA ACTTCCCCCA CACCTCCCAC 4 500 
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CACCCTGCCA TCCTGTGTGG CTGACAAGAA AGGCCAATGG CCAGATGGGG ACACAGACTC 4 560 

AGGGAAGCTT GGAATATGTT CCCCTCCTCA TATCCTAGGC CTTGTTGTCC CCCTGAGGGC 4 62 0 

5 CCAGCCTATG AGTAGGGCAG CTGTGGGCTG CCCTAAGGTT GGGTAGGCAA GAAGGGGGTG 4 680 

GTCCCTCAGG GTGGGTCACA GGATTGAGGT CATTTCCAAA GTGGCCATCA CAGTGGCCCT 4 74 0 

AGGAAATGAT TGTGGAGAGT CAGAACTCCT GTTGGGAGTT GTAGAGGGCC TTGCATGTGG 4 800 

GCTTCTGTGG CTGTCCCTTC TCTTGTGGTC CTTTGCACAG TCCCCTCGTG TGTGCTGGGA 4 860 

TGTGAGGAGG GCACGGGGAA AATGAAGGCT CAGCCCCTCA GCTTGCCCTT CACGGTTCAC 4 920 

15 CCAACAGGGC TCACCTCTCC TCTGGACAGG CTCTCACTGT ATGCACAGAT TGGCCTCACA 4 98 0 

TTTGATTCCC TTCCTTTGGT CTCCTGGGAT GACAAACATT TACCAGGGTA GGATTTTACA 504 0 

TTTTAGATAT GTCCATTCTC CAGAAACACA CTTGTGAGGT TAGGGTATCA GTGAAAGGAC 5100 

20 

ACCACCAGGA CAGACAAAGA ATTGGAGAGG AAGGAAATTG GTAAGCCAGG CCATGCTTGA 5160 

TGGCTTATGT GTAATCCCAG AACTCTGGAC GCTGAGGCAG GAGGATTCCA AGTTTCAAGA 5220 

2 5 CAGTGTGTTC TAGGTAATGA GACCCTGTCA AGAAAAGAAA AGAAATAAAG AGACAAGAAA 5280 

ATGTTTATAG ciZTGTGAGAC AGCTTGGTGG GTAAGGGGCA CTTGCCTCCA ATCAAGATGA 534 0 

CCTCAGCCCC ATCCCTAGGA ATCCATGGTA GAAGGAGAAA GCAAA.CTCCA GCTGCTGACC 5400 

30 

TCCATACATG TGCTCCAATG TGCACACACA CAGGGAGACA TAATCAATTA ATAGGATGTA 54 6 0 

TTTGCTTAGA TTTGAGTAGG CATTTATGAC TGATGTTTTA AAATTTTTAT TTGATTTTAT 552 0 

3 5 GAAAATATAC CTGTTTGTAT TTGGTTTGGT TTGGTTTGAG TTTTGTTTAT TTGAGACAGG 5580 

GCTTCTCTGT GTAGTCCTGG CTGtCCTTGG AACTCACTCT GTAGACCAGG CTGGCCTTGA 564 0 
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ACTCAGAAAT CCGCCTGCTT GTGCTTCCCA AGTGCTTAGA TTAAAGGTGT GCACTGCCAT 5 7 00 

TCAGCAAAAT TGCATACTTT AACCCCAGTA TTTGGGAGGC AGAGGCAGAC TAATGTGTGA 5760 

5 ATTCCAGGCT AGCCAAGGAT ACAGAGTGAG ACCCTATTCT TACCCTCCCC CCCCAAAACC 5 820 

CCAAAATGTA TTTTGTGCTT GTGTATGTAC ATGTGTGTTG CAGCACGTAA ATGTCGAAGG 5 88 0 

ACAACTTGTA GAAGTTCTCT CCGTTCACAG TCTAAGTCCT GAATTCAAAC TAAGGTCCTC 5 94 0 

10 

AGGCTTAGCC ACAGTCTTCT TTATGTACTG AGCCATTTCA CTGGCCCTGG ATTGACTGAT 6 000 

GAATTAATTT TTGAGATAAG GTCTCTTGTA GCTCTAGCTA GGCTCAAACT ATGAACTCCC 6 06 0 

15 AAGGTCATCT TGAGCTGCTG GTACTCTTGC TTCCACCCCA AGTGGTGGAA TGATACTCAG 6120 

GCAGCACTTC TCTGGGGAAG GGGCTGGCCT TGGCCTTGAT TTTGTTGCCT CAGCTTCAAT 6180 

GAGTGCTTGG GTCTCGTTGT TTCTTTTCTT TATCTGTGAA ATGGGTGAAC ACCTGTTCAA 6240 

20 

GACTTCCTGA CTCTTGAAAC ATCCAGGCAG GGTGAGGGAC TTGAAGTGGG CTCATCCCAT 63 00 

GCCTAACAAA GTGTCGTCTT TGACCCCAGA CACAGCTGTA ATCAGCCCCC AGGACCCCAC ■ 63 60 

2 5 CCTTCTCATC GGCTCCTCCC TGCAAGCTAC CTGCTCTATA CATGGAGACA CACCTGGGGC 64 2 0 

CACCGCTGAG GGGCTCTACT GGACCTTCAA TGGTCGCCGC CTGCCCTCTG AGCTGTCCCG 64 80 

CCTCCTTAAC ACCTCCACCC TGGCCCTGGC CCTGGCTAAC CTTAATGGGT CCAGGCAGCA 6540 

30 

GTCAGGAGAC AATCTGGTGT GTCACGCCCG AGACGGCAGC ATTCTGGCTG GCTCCTGCCT 66 00 

CTATGTTGGC TGTAAGTGGG GCCCCAGACA CTCAGAGATA GATGGGGGTT GGCAATGACA 6660 

3 5 GATTTAGAGC CTGGGTCTTC TGTCCTGGGG CAGAGCCATG GGCTCTCACT TGCATGCAGG 6720 

CATGGTCATA CCCAGCACAG GCATTGCAAC TCTAGGGACA GCTGTGGCTG CACTGTCCCC 67 8 0 
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TGTGTACCCC ACAGCTTTAG AAAAGCTGTC ATGTTTTCCT TGTAGTGCCC CCTGAGAAGC 6 84 0 

CCTTTAACAT CAGCTGCTGG TCCCGGAACA TGAAGGATCT CACGTGCCGC TGGACACCGG 6 90 0 

5 GTGCACACGG GGAGACATTC TTACATACCA ACTACTCCCT CAAGTACAAG CTGAGGTTGG 6 960 

TACCCAGCCA AGCCTTGCTG TGTGACTTCT GGCAATACTT ACCTTCTCTG ATCAAATATG 7 02 0 

TTCCTGTTTA TGAACTCAAA AGGGACTCTC GCACCTCCAC AGGTGGTACG GTCAGGATAA 708 0 

CACATGTGAG GAGTACCACA CTGTGGGCCC TCACTCATGC CATATCCCCA AGGACCTGGC 714 0 

CCTCTTCACT CCCTATGAGA TCTGGGTGGA AGCCACCAAT CGCCTAGGCT CAGCAAGATC 7200 

15 TGATGTCCTC ACACTGGATG TCCTGGACGT GGGTGAGCCC CCAGTGTCCA CCTGTGTTCT 7 2 60 

GCCCTAGACC TTATAGGGCG CCTCCCCCCC ATCCCCCCAG ACTTTTTGGT TCTTCTAGAG 732 0 

GTCTTAGCCA CAGCCACGGT GGTTGCAGGA CAGTGGTTGT TCATAACTTA ATGCAAAGAC 73 80 

20 

TTTCCCCCAA GACAGTCAAG ATTTTCCCCT CCCCACCCCC AACACACACA TACACACACA 744 0 

CTCTGCAGAG AACACCTGGC CTGACCACCC TCCCTCTCTA CAGCCCAGGT GTTCAGAAGG 75 00 

2 5 GAGTCCTAGG GGACTGAGAG GAGGCGCCCA GGTCTGAAGG CGCCCCAGGA AGCCGAGGCC 75 60 

TTGAGCTGGG GGGGGGGGCG AGGGTTGGAG GCACGAACTG GATGATCCCT GAGCACAACT 7 62 0 

GGGCCTAATC TAATTAGGGT GTTCCCAGCC CAAAGCAGCC TGGGCCATTT AACCCTTCAA 7680 

30 

GTGCCTCACT GAAGACTCAG GGGAGAGATC AGCTTGTACT CTCTCCATGG TCCCCCAGGA 77 4 0 

GGGTTCCTGG GTGCCCCTGG CTCATTCCCA CATCCAGAGG TTTTGTGTCT TCCTGGCATC 7 80 0 

3 5 TAACCCTCAG TTGTGCTCTG TGGCTGGCAC AGCTGCCCCG TGGAGGCTCT TGGTAATGTA 7 86 0 

CAAGGCATCA GAGGTGGACA TGGGATGGGG ATACATAGGG ATGGAGCCAA ATAGCACCTC 7 920 
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AAGGTGGGGT GATATACAAT AAAGCTTGTC ACCCTGACGC TCAGAAAGCC TACTCATGAT 7 9 80 

GATCACAATT GTTGACATCA CTCTGGGACA TGTAGTGAGA CCCTAGCTCA AAACACAGAC 804 0 

5 AGTAGCTTTA AGAGTCAGCT TGTGACTTAA TACTGGAACT CAGGGCCTAA TAGGTGCTGG 8100 

GTGATGCTCG CCTCACTCCC TGTTTAGTGA GATCTCTGCG CTAATCTCCA CCCCAGCTGG 8160 

GTGGGCTGCT CTGTCCCCTT GAGGGCAGGA ATGTGTGTCT TCCATCAGAG ATAGGACCCG 8220 

TGGTAGCAGC AACTGCTGCT GGCTGTTTCT GGAATATTAA ATGACAGTAA TCTATCAGGC 82 80 

CTGGGTGAGT AGCTAACAGG GGTGGGGGCG TGGTCTGGAA AACGCAGATA GGGTCATAGG 83 4 0 

15 AGCCACTGCA GCCTAGATTA CACCACTGGG TGTTCTGTCA CTAGGCCATT CTCACCAAGC 84 00 

AGTCCTCAGA ACTGGGAGCA CTGTTGCCAG CATTTAATGC CAGCATTTAA TGCCAGCATT 8460 

AGGGGAGGCA GAGGCAGAAG GATCTCTCTG AGTTCAAGGC CATCCTGAAT TTACATAAAG .8 52 0 

20 

AGCTCCAGGC CAGCCAGGGT GCGCAGTAAA ACCTTGTCTC AAAAAACAAA GCATCTTTAG . 85 80 

TGACCAGGCT TGCTCCACCC CCAGTGACCA CGGACCCCCC ACCCGACGTG CACGTGAGCC 864 0 

2 5 GCGTTGGGGG CCTGGAGGAC CAGCTGAGTG TGCGCTGGGT CTCACCACCA GCTCTCAAGG ' ^870 0 

ATTTCCTCTT CCAAGCCAAG TACCAGATCC GCTACCGCGT GGAGGACAGC GTGGACTGGA 876 0 

AGGTGCCCGT CCCGCCCCGG ACCCGCCCCT GACCCCGCCC CCCGCATCTG ACTCCTCCCT 882 0 

30 

CACCGTGCAG GTGGTGGATG ACGTCAGCAA CCAGACCTCC TGCCGTCTCG CGGGCCTGAA 8 88 0 

GCCCGGCACC GTTTACTTCG TCCAAGTGCG TTGTAACCCA TTCGGGATCT ATGGGTCGAA 8 94 0 

3 5 AAAGGCGGGA ATCTGGAGCG AGTGGAGCCA CCCCACCGCT GCCTCCACCC CTCGAAGTGG 9000 

TGAGCACCTC TCCAGGGCTG GCTGGCCCAT GGAATCCCCA ATCCATCCTG TTCCTTCCCC 9060 
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CCCACCCTTT TTTTGAGACA GCGTCTTCAG GTAGCGCATG CTGGCCTTAA ATTCAGTATG 912 0 

TAGTCAAGGA TGACCTCGAG CTCCTGGTCT TTTTGTCTCC ACTTAGAGAC AATGGCCAGT 9180 

5 GGCCATCACC ACCTTTGGGA GACTAGCCAT GGAGTCTATT TAGCCTGTCA TTTGGTGACA 9240 

GATGGAGTAC AACAGTGTGA CCTCTTGTAA GAGAACTGAA GACAGGCTGT TTTTAACCCC 930 0 

AATATCCTAG GCTCTCTAGA GGTTAACTTT ATATAAAATA GAGACTATTA CAGCCAGTTA 93 60 

TCACATGGTC CCACAGAACC TTTTGTCACA CAACCTATAG ACCACAGTGC CTGTGCCTAC 94 2 0 

CACATAAGGG TCTCTACTGC TGGCCCACCC CTCCAACCCT TAAAAGGTAA CCTAGGCAGC 94 8 0 

15 CTTAATATTT GCAATCCTCC TACCTCAGCC TCTTGAATGC TCAGAAACCA GGCATTAACC 954 0 

CAAGTTTCTC TTCTCTGGGT CCCTTTCTTA AGGTGGGAGG GCCTAAAGAT GACTTCCTTT 960 0 

GTCCTGAAGA CTCTCCGAGC CCATGGATCT GCACTCTCTA ATATGAAATA TATTGCATAA 9660 

20 

AATGTCTGGC CTCAGTTTCC CCACCTGTCA GGTTTAGGCA GCACAGTCGG TCCAAGACAC 972 0 

TTCATTATTT GCAGGCAGTA TAAGAAGAAG CTCCCATCCC CCACCCGCTT CCTCCGGTCC 97 8 0 

25 CTAAGACAGA ATACTTCTAC ACTGAAACTG AACTCTCGCA GACGCATATG CTCACTTTAA 9 84 0 

TGATGATGAA ATAATGGGGA AACTGAGGCT CCGAGAGATT CCTGGAGGAA GAGGGTCAAA 990 0 

ACCAGCTCCA GGAAGCTCTC CAGCCCCCAT CCGGGCCTCT CCAGGTTCTG GGCTTGGCGG 996 0 

30 

GAGTGAACAC AGGTGGGAGG GGCTGGAGCC TGGGAGCTTT GGCCCTTGCT CGTGCCCAGC 10 020 

ACCTGCGATT CTTGCACGGG AGCCAGCAGG CGGCTGCGTC CGCCCGAGAG ACTGAAGAAG 100 80 

3 5 CCGGGGGTAG GGTTGGAGGG AGGTAAGCAG GGGCTGTGGG GGCCGAAGCT . TGTGCCAGGG 1014 0 

CCTGTCAGCG AGTCCCCAGT TTTATTTATG GCGTGAGGCC GATGTCCTTA TCCGCTGGCC 102 00 
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TGCTGGGGGA TGGCTGCGGC TGGGGATTGG 

CAGCCCACTC CATGTCACAC CCGTGCATTC 
5 TTCTGTGCTG TCTGTCTCTA TTTCTGTCAT 

TTAATATAAC TACGTTTTAA AAATTGCTTT 
GTGCCACAAC ACACACGTGA AGGTTAGAGA 

10 

GGGACTAGGG CTGGCGAGAA GAGCAATTAC 
CTTCCCATCC TGTTTGGATA GTCATAGGTA 
15 TAGCTATCCT GCCTCAGCCT ACCAAGTGCT 

TCCCAGTGTC TGGGGGTACA CAGTCCCAAG 
TGCCCCTTGC TTTGTCCGTG TCCCTAGAGT 

20 

CTTTCTGACC GAATACTCGG TTTTACCTCC 
CCATCGCCGT GGCATTGCCA TTCCTCTGGG 

2 5 CAACTTTCCC CAGCCGAAGC TGGTCTGGTA 

GCTGGCCGCG CCCCAACACT GCCGCTCCAT 
GGGTGTGCGA GCCGCGGGGC GGCGAGCCCA 

30 

AGTTCCTCGG CTGGCTCAAG AAGCACGCAT 
ACCAGTGGCG TGCTTGGATG CAGAAGTCAC 

3 5 GGGAGGCTTG CGTGGGGGGT AAAGGAGCAG 

CACAACACCG CACTCTTCTT TCCAAGCACA 
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ACCCAAGGGC TGGCTTCCCA CTCAGTCCTC 10260 

TCTGAGGCTT ATCTTGGGAA CCCGCCCTTG 1032 0 

TCACTTTCCC AGAGCCTTTT TTTTATGCTT 103 8 0 

TGTATAATGT GTGTGCCTTC GTGAGCGTGC 1044 0 

ACTTTGTTGA GTAGGCTCCT TCCACCATGT 105 0 0 

TGAGTCATCT CGCCAGCCCC TCACCCCTCA 10560 

ATCGAAGGTA AATCGCTGGC TTTAATTTCG 1062 0 

GTGCTACCAC GTTTGTGGGA GGGGCTCTCC 10680 

ATCTCTGCTT TCTAGGTCTT TGTCTTAGTT • 10740 

CTCCGGCCCC ACTTAGTCTC CATTGATTTC 108 00 

CACTGATTTG ACTCCCTCCT TTGCTTGTCT 10860 

TGACTCTGGG TCCACACCTG ACACCTTTCC 10 920 

TGGGAGGCCG CCGTCCCGCG CGCGCCTCCT / 10 9 80 

TCTCTTTAGA GCGCCCGGGC CCGGGCGGCG 11040 

GCTCGGGCCC GGTGCGGCGC GAGCTCAAGC 11100 

ACTGCTCGAA CCTTAGTTTC CGCCTGTACG 11160 

ACAAGACCCG AAACCAGGTA GGAAAGTTGG 112 20 

AGGAAGAGAG AGACCCGGGT GAGCAGCCTC 112 80 

GGACGAGGGG ATCCTGCCCT CGGGCAGACG 113 4 0 
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GGGTGCGGCG AGAGGTAAGG GGGTCTGGGT GAGTGGGGCC TACAGCAGTC TAGATGAGGC 114 0 0 

CCTTTCCCCT CCTTCGGTGT TGCTCAAAGG GATCTCTTAG TGCTCATTTC ACCCACTGCA 114 60 

5 AAGAGCCCCA GGTTTTACTG CATCATCAAG TTGCTGAAGG GTCCAGGCTT AATGTGGCCT 11520 

CTTTTCTGCC CTCAGGTCCT GCCGGCTAAA CTCTAAGGAT AGGCCATCCT CCTGCTGGGT 11580 

CAGACCTGGA GGCTCACCTG AATTGGAGCC CCTCTGTACC ATCTGGGCAA CAAAGAAACC 1164 0 

TACCAGAGGC TGGGCACAAT GAGCTCCCAC AACCACAGCT TTGGTCCACA TGATGGTCAC 1170 0 

ACTTGGATAT ACCCCAGTGT GGGTAGGGTT GGGGTATTGC AGGGCCTCCC AAGAGTCTCT 11760 

15 TTAAATAAAT AAAGGAGTTG TTCAGGTCCC GATGGCCAGT GTGTTTGGGG CCTATGTGCT 11820 

GGGGTGGGGG GA 11832 



20 (2) INFORMATION FOR SEQ ID NO : 3 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acids 

25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 



3 5 Val lie Ser Pro Gin Asp Pro Thr Leu l^eu He Gly Ser Ser Leu Gin Ala Thr Cya Ser 

5 10 15 20 
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lie His Gly Asp Thr Pro 
25 
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1. A nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding a 

5 novel haemopoietin receptor or derivative thereof having the 
motif : 

Trp Ser Xaa Trp Ser [SEQ ID NO:l], 
10 wherein Xaa is any amino acid. 

2 . A nucleic acid molecule according to claim 1 wherein Xaa 
is Asp or Glu, 

15 3 . A nucleic acid molecule according to claim 1 or 2 wherein 
said nucleic acid molecule is capable of hybridisation under 
low stringency conditions at 421C to: 

5N (A/G)CTCCA(A/G)TC(A/G)CTCCA 3N [SEQ ID NO : 7 ] ; and 
20 5N (A/G)CTCCA(C/T)TC(A/G)CTCCA 3N [SEQ ID NO:8]. 

4 . A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 12 or a nucleotide sequence having at least 60% similarity 

25 to the nucleotide sequence set forth in SEQ ID NO: 12 or a 

nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

5. A nucleic acid molecule according to claim 3 comprising a 
30 sequence of nucleotides substantially as set forth in SEQ ID 

NO: 14 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 14 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

35 

6 . A nucleic acid molecule according to claim 3 comprising a 
seauence of nucleotides substantially as set forth in SEQ ID 
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NO: 16 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 16 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

5 

7, A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 18 or 24 or a nucleotide sequence having at least 60% 
similarity to the nucleotide sequence set forth in SEQ ID NO: 18 

10 or 24 or a nucleotide sequence capable of hybridising thereto 
under low stringency conditions at 4 21C. 

8. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 

15 NO: 28 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 28 or a 
nucleotide sequence capable of hybridising thereto under low 
stringency conditions at 421C. 

20 9. A nucleic acid molecule according to claim 3 comprising a 
sequence of nucleotides substantially as set forth in SEQ ID 
NO: 38 or a nucleotide sequence having at least 60% similarity 
to the nucleotide sequence set forth in SEQ ID NO: 38 or a 
nucleotide sequence capable of hybridising thereto under low 

25 stringency conditions at 421C. 

10 . A nucleic acid molecule according to claim 4 or 5 or 6 or 
7 or 8 or 9 wherein said haemopoietin receptor is of murine 
origin. 

30 

11. A nucleic acid molecule according to claim 9 wherein said 
haemopoietin receptor is of human origin. 

12 . An expression vector comprising a nucleic acid molecule 
35 selected from the list consisting of: 

(i) a nucleotide sequence as set forth in SEQ ID N0:12; 

(ii) a nucleotide sequence as set forth in SEQ ID NO:14; 
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(iii) a nucleotide sequence as set forth in SEQ ID NO:16; 

(iv) a nucleotide sequence as set forth in SEQ ID N0:18 

(v) a nucleotide sequence as set forth in SEQ ID NO: 24 

(vi) a nucleotide sequence as set forth in SEQ ID NO: 28; and 
5 (vii) a nucleotide sequence as set forth in SEQ ID NO: 38. 

13. A method for cloning a nucleotide sequence encoding a 
haemopoietin receptor having the characteristics of NR6 or a 
derivative thereof, said method comprising searching a 

10 nucleotide database for a sequence which encodes an amino acid 
sequence as set forth in one or more of SEQ ID NO : 1 , SEQ ID 
NO: 7 and/or SEQ ID NO : 8 , designing one or more oligonucleotide 
primers based on the nucleotide sequence located in said 
search, screening a nucleic acid library with said one or more 

15 oligonucleotides and obtaining a clone therefore which encodes 
NR6 or a part or derivative thereof. 

14 . An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 

20 thereof having an amino acid sequence substantially as set 

forth in SEQ ID NO: 13 or having at least about 50% similarity 
thereto. 

15. An isolated nucleic acid molecule comprising a sequence of 
25 nucleotides encoding a haemopoietin receptor or derivative 

thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO:15 or having at least about 50% similarity 
thereto. 

30 16. An isolated nucleic acid molecule comprising a sequence of 

nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 17 or having at least about 50% similarity 
thereto. 



35 



17 . An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
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thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 19 or having at least about 50% similarity 
thereto . 

5 18. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 25 or having at least about 50% similarity 
thereto . 

10 

19. An isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding a haemopoietin receptor or derivative 
thereof having an amino acid sequence substantially as set 
forth in SEQ ID NO: 29 or having at least about 50% similarity 

15 thereto. 

20. An isolated novel haemopoietin receptor comprising the 
amino acid motif: 

20 Trp Ser Xaa Trp Ser [SEQ ID NO:l] 

wherein Xaa is any amino acid. 

21. An isolated haemopoietin receptor according to claim 20 
25 wherein Xaa is Asp or Glu. 

22. Pin isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 13 . 

30 

23. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 15. 

35 24. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 17 . 
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25. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 19. 

5 26. An isolated haemopoietin receptor according to claim 21 
comprising the amino acid sequence substantially as set forth 
in SEQ ID NO: 25. 

27. An isolated haemopoietin receptor according to claim 21 
10 comprising the amino acid sequence substantially as set forth 

in SEQ ID NO: 29. 

28. A method for modulating expression of NR6 in a mammal, 
said method comprising contacting a genetic sequence encoding 

15 said NR6 with an effective amount of a modulator of NR6 

expression for a time and under conditions sufficient to up- 
regulate or down- regulate or otherwise modulate expression of 
NRG, wherein the genetic sequence encoding said NR6 is selected 
from the nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 

20 16 or 18 or 24 or 28 or 38 or is a sequence having at least 

about 60% similarity to at least one of SEQ ID NO: 12 or 14 or 
16 or 18 or 24 or 28 or 38 and is capable of hybridising 
thereto under low stringency conditions at 421C. 

25 29. A method of modulating activity of NRG in a mammal, said 
method comprising administering to said mammal, a modulating 
effective amount of a molecule for a time and under conditions 
sufficient to increase or decrease NRG activity wherein said 
NRG comprises an amino acid sequence: 

30 

(i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 38 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
35 forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

and which is capable of hybridising thereto under low 
stringency conditions at 42 IC; and 
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(ii) substantially as set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 32 or 30 or a sequence having at least 50% 
similarity thereto. 

30. A pharmaceutical composition comprising an NR6 receptor in 
soluble form and one or more pharmaceut ically acceptable 
carriers and/or diluents wherein said NR6 comprises the amino 
acid sequence : 



10 (i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 38 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

15 and which is capable of hybridising thereto under low 

stringency conditions at 42 IC; and 
(ii) substantially as set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 32 or 30 or a sequence having at least 50% 
similarity thereto. 



20 



31. An isolated antibody or a preparation of antibodies to an 
NR6 receptor, said NR6 receptor comprising the amino acid 
sequence : 



25 (i) encoded by a nucleotide sequence selected from the 

nucleotide sequence set forth in SEQ ID NO: 12 or 14 or 16 
or 18 or 24 or 28 or 38 or a nucleotide sequence having 
at least 60% similarity to the nucleotide sequence set 
forth in SEQ ID NO: 12 or 14 or 16 or 18 or 24 or 28 or 38 

30 and which is capable of hybridising thereto under low 

stringency conditions at 421C; and 
(ii) substantially as set forth in SEQ ID NO: 12 or 14 or 16 or 
18 or 24 or 28 or 38 or a sequence having at least 50% 
similarity thereto. 

35 

32. A trangenic animal comprising a mutation in at least one 
allele of the gene encoding NR6 . 
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33. A transgenic animal according to claim 33 comprising a 
mutation in two alleles of the gene encoding NR6 . 

34. A transgenic animal according to claim 3 3 or 34 wherein 
5 said animal is a murine animal. 
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7/43 
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3/43 


gi 


cccagaactct 


g3 8 


agtttcaagacagtgtgtt 


g8 3 


aagaaaagaaataaagaga 


gl2 8 


cagcttggtgggtaagggg 


gl73 


agccccatccctaggaatc 


g218 


cagctgctgacctccatac 


g2 6 3 


ggagacataatcaattaat 


g3 0 8 


ggcatttatgactgatgtt 


g3 53 


aatatacctgtttgtattt 


g3 9 8 


atttgagacagggcttctc 


g443 


tcactctgtagaccaggct 


g4 8 8 


ttgtgcttcccaagtgctt 


g533 


gcaaaattgcatactttaa 


g5 7 8 


actaatgtgtgaattccag 


g6 2 3 


ctattcttaccctcccccc 


g6 6 8 


ttgtgtatgtacatgtgtg 


g713 


acttgtagaagttctctcc 


gV 5 8 


actaaggtcctcaggctta 


g803 


catttcactggccctggat 


g84 8 


aggtctcttgtagctctag 


g893 


gtcatcttgagctgctggt 


g93 8 


aatgatactcaggcagcac 


g983 


ccttgattttgttgcctca 


gl028 


gtttcttttctttatctgt 


gl073 


ttcctgactcttgaaacat 
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4/43 



tggacgctgaggcaggaggattccca 

tctaggtaatgagaccctgtcaagaa 

caagaaaatgtttataggctgtgaga 

cacttgcctccaatcaagatgacctc 

catggtagaaggagaaagcaaactcg 

atgtgctccaatgtgcacacacacag 

aggatgtatttgcttagatttgagta 

ttaaaatttttatttgattttatgaa 

ggtttggtttggtttgagttttgttt 

tgtgtagtcctggctgtccttggaac 

ggccttgaactcagaaatccgcctgc 

agattaaaggtgtgcactgccattca 

ccccagtatttgggaggcagaggcag 

gctagccaaggatacagagtgagacc 

ccaaaaccccaaaatgtattttgtgc 

ttgcagcacgtaaatgtccaaggaca 

gttcacagtctaagtcctgaattcaa 

gccacagtcttctttatgtactgagc 

tgactgatgaattaatttttgagata 

c t aggc t caaactatgaactcccaag 

actcttgcttccaccccaagtggtgg 

ttctctggggaaggggctggccttgg 

gcttcaatgagtgcttgggtctcgtt: 

gaaatgggtgaacacctgttcaagac 

ccaggcagggtgagggacttgaagtg 
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5/43 



glll8 gg c t c a t c c c a t g c c t a a c 

gll63 age tgtaat cagcccccag 

L Q A T C S 
gl208 CCTGCAAGCTACCTGCTCT 

A E G L Y W 
gl253 CG CTGAGGGGCTCTACTGG 

E L S R L L 
gl298 TGAGCTGTCCCGCCTCCTT 

A N L N G S 
gl343 GGCTAACCTTAATGGGTCC 

C H A R D G 
gl388 GTGTCACGCCCGAGACGGC 

V G 

gl433 TGTTGGCT gtaagtggggc 

gl478 t tggcaatgacagat t t ag 

gl523 agccatgggct ct cact tg 

gl568 aggc at tgc aac t c t aggg 

gl613 gt accccacagct t tagaa 
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6/43 



aaagtgtcgtctttgaccccagacac 
D P T L L I G S S 
GACCCCACCCTTCTCATCGGCTCCTC 

IHGDTPGAT 
ATACATGGAGACACACCTGGGGCCAC 



TFNGRRLPS 
ACCTTCAATGGTCGCCGCCTGCCCTC 

NTSTLALAL 
AACACCTCCACCCTGGCCCTGGCCCT 

RQQSGDNLV 
AGGCAGCAGTCAGGAGACAATCTGGT 



SILAGSCLY 
AGCATTCTGGCTGGCTCCTGCCTCTA 



cccagacactcagagatagatggggg 

agcctgggtcttctgtcctggggcag 
catgcaggcatggtcatacccagcac 
acagctgtggctgcactgtcccctgt 

L 

aagctgtcatgttttccttgta g TGC 



Fig.2(iv) 
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P P E K P F N 
gl658 CCCCTGAGAAGCCCTTTAA 

K D L T C R W 
gl703 AGGATCTCACGTGCCGCTG 

F L H T N Y S 
gl748 TCTTACATACCAACTACTC 
gl793 ccagccaagcct tgctgtg 

gl838 t ga t c aa a t a t g t t c c t g t 

W Y G 

gl8 8 3 cc t c cacacf GTGGTACGGT 

T V G P H S 
g 1 9 2 8 CACTGTGGGCCCTCACTCA 

F T P Y E I 
gl973 CTTCACTCCCTATGAGATC 

S A R S D V 
g2018 CTCAGCAAGATCTGATGTC 

g2063 tgagcccccagtgt ccacc 

g2108 c gc c t c c c c c c c a t c c c c c 

g2153 t tagccacagccacggtgg 

g2198 t aa t gc aaagac t t t c c c c 



Fig.2(v) 
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ISCWSRNM 
CATCAGCTGCTGGTCCCGGAACATGA 



TPGAHGET 
GACACCGGGTGCACACGGGGAGACAT 

L K Y K L R 
CCTCAAGTACAAGCTGAG q 1 1 g g t a C 
tgacttctggcaatacttaccttctc 
ttatgaactcaaaagggactctcgca 

QDNTCEEYH 
CAGGATAACACATGTGAGGAGTACCA 

CHIPKDLAL 
TGCCATATCCCCAAGGACCTGGCCCT 

WVEATNRLG 
TGGGTGGAAGCCACCAATCGCCTAGG 

LTLDVL.DV 
CTCACACTGGATGTCCTGGACGTGG q 

tgtgttctgccctagaccttataggg 
cagactttttggttcttctagaggtc 
ttgcaggacagtggttgttcataact 
caagacagtcaagatttttcccctcc 



Fig.2(vi) 
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g2 2 4 3 
g2 2 8 8 
g2 3 3 3 
g2 3 7 8 
g2423 
g24 6 8 
g2 5 1 3 
g2 5 5 8 
g2 6 0 3 
g2648 
g 2 6 9 3 
g2 7 3 8 
g2 7 8 3 
g2 8 2 8 
g2 8 7 3 
g2 9 1 8 
g2 9 6 3 
g3 0 0 8 
g3 0 5 3 
g3 0 9 8 
g3 14 3 
g3 1 8 8 
g3 2 3 3 
g327 8 
g3323 
g3 3 6 8 



ccacccccaacacacacat 

ggcctgaccaccctccctc 

gtcctaggggactgagagg 

ggaagccgaggccttgagc 

acgaactggatgatccctg 

ggtgttcccagcccaaagc 

gcctcactgaagactcagg 

tggtcccccaggagggttc 

tccagaggttttgtgtctt 

ctgtggctggcacagctgc 

aggcatcagaggtggacat 

caaatagcacctcaaggtg 

cctgacgctcagaaagcct 

tcactctgggacatgtagt 

tagctttaagagtcagctt 

taataggtgctgggtgatg 

tctctgcgctaatctccac 

cttgagggcaggaatgtgt 

gtagcagcaac t g c t g c t g 

taatctatcaggcctgggt 

gtctggaaaacgcagatag 

ttacaccactgggtgttct 

tcctcagaactgggagcac 

taatgccagcattagggga 

ttcaaggccatcctgaatt 

ggtgcgcagtaaaaccttg 
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acacacacactctgcagagaacacct 
tctacagcccaggtgttcagaaggga 
aggcgcccaggtctgaaggcgcccca 

tgggggggggggcgagggt tggaggc 

agcacaactgggcctaatctaattag 

agcctgggccatttaacccttcaagt 

ggagagatcagcttgtactctctcca 

ctgggtgcccctggctcattcccaca 

cctggcatctaaccctcagttgtgct 

cccgtggaggc t c t tgg t aa t gt aca 

gggatggggatacatagggatggagc 

gggtgatatacaataaagcttgtcac 

actcatgatgatcacaattgttgaca 

gagaccctagctcaaaacacagacag 

gtgacttaatactggaactcagggcc 

ctcgcctcactccctgtttagtgaga 

cccagctgggtgggctgctctgtccc 

gtcttccatcagagataggacccgtg 

gctgtttctggaatattaaatgacag 

gagtagctaacaggggtgggggcgtg 

ggtcataggagccactgcagcctaga 

gtcactaggccattctcaccaagcag 

tgttgccagcatttaatgccagcatt 

ggcagaggcagaaggatctctctgag 

tacataaagagctccaggccagccag 

tctcaaaaaacaaagcatctttagtg 
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g3413 a c c aggc t t g c t c c a c c c c 

V H V S R V G 
g3 4 5 8 GTGCACGTGAGCCGCGTTG 

R W V S P P 
g3 5 0 3 CGCTGGGTCTCACCACCAG 

K Y Q I R Y 
g3548 AAGTACCAGATCCGCTACC 
g3593 g t g c c eg t c c c gc c c c gga 

g3638 ctgactcctccctcaccgt 

Q T S C R L A 
g3683 AGACCTCCTGCCGTCTCGC 

F V Q V R C N 
g3728 TCGTCCAAGTGCGTTGTAA 

K A G I W S E 
g3 7 7 3 AGGCGGGAATCTGGAGCGA 

T P R S 
g3818 CCCCTCGAAGTG q tcracrca 

g3863 aat ccccaat ccatcc tgt 



Fig.2(ix) 
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VTTDPPPD 
cagTGACCACGGACCCCCCACCCGAC 

GLEDQLSV 
GGGGCCTGGAGGACCAGCTGAGTGTg 

ALKDFLFQA 
CTCTCAAGGATTTCCTCTTCCAAGCC 

RVEDSVD WK 

GCGTGGAGGACAGCGTGGACTGGAAG 

cccgcccctgaccccgccccccgcat 

V V D D V S N 
gcag GTGGTGGATGACGTCAGCAACC 

GLKPGTVY 
GGGCCTGAAGCCCGGCACCGTTTACT 

PFGIYGSK 
CCCATTCGGGATCTATGGGTCGAAAA 

WSHPTAAS 
GTGGAGCCACCCCACCGCTGCCTCCA 



cc t c tccagggc tggc tggcccatgg 
tccttcccccccaccctttttttgag 



Fig.2(x) 
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g3 9 0 8 
g3 953 
g3 9 9 8 
g4 04 3 
g4 0 8 8 
g4 13 3 
g4 1 7 8 
g4 2 2 3 
g4 2 6 8 
g4313 
g4 3 5 8 
g4403 
g4 44 8 
g4493 
g4 5 3 8 
g4 5 8 3 
g4 62 8 
g4673 
g4718 
g4763 
g4 8 0 8 
g4 8 53 
g4 8 9 8 
g4 94 3 
g4988 
g5033 



acagcgtcttcaggtagcg 

gtcaaggatgacctcgagc 

gacaatggccagtggccat 

agtctatttagcctgtcat 

tgacctcttgtaagagaac 

tatcctaggctctctagag 

ttacagccagttatcacat 

acctatagaccacagtgcc 

tgctggcccacccctccaa 

taatatttgcaatcctcct 

ccaggcattaacccaagtt 

9tgggagggcctaaagatg 

agcccatggatctgcactc 

tgtctggcctcagtttccc 

cggtccaagacacttcatt 

cccatcccccacccgcttc 

tacactgaaactgaactct 

atgatgaaataatggggaa 

gaagagggtcaaaaccagc 

gggcct ct ccaggt t c t gg 

aggggctggagcctgggag 

ctgcgattcttgcacggga 

gagactgaagaagccgggg 

gctgtgggggccgaagctt 

agttttatttatggcgtga 

c tgggggatggctgcggct 
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catgctggccttaaattcagtatgta 

tcctggtctttttgtctccacttaga 

caccacctttgggagactagccatgg 

ttggtgacagatggagtacaacagtg 

tgaagacaggctgtttttaaccccaa 

gttaactttatataaaatagagacta 

ggtcccacagaaccttttgtcacaca 

tgtgcctaccacataagggtctctac 

cccttaaaaggtaacctaggcagcct 

acctcagcctcttgaatgctcagaaa 

tctcttctctgggtccctttcttaag 

acttcctttgtcctgaagactctccg 

tctaatatgaaatatattgcataaaa 

cacctgtcaggtttaggcagcacagt 

atttgcaggcagtataagaagaagct 

ctccggtccctaagacagaatacttc 

cgcagacgcatatgctcactttaatg 

actgaggctccgagagattcctggag 

tccaggaagctctccagcccccatcc 

gcttggcgggagtgaacacagctggg 

ctttggcccttgctcgtgcccagcac 

gccagcaggcggctgcgtccgcccga 

gtagggttggagggaggtaagcaggg 

gtgccagggcctgtcagcgagtcccc 

ggccgatgtccttatccgctggcctg 

gggga t t ggac c caagggc t ggc t t c 
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g5078 ccac t cagt cctccagccc 

g5123 t gaggc t t a t c t t gggaac 

g5168 c t a t t t c t g t c a t t c ac t t 

g5213 aatataac tacgt t t taaa 

g5258 t t cgtgagcgtgcgtgcca 

g5303 t t tgt tgagt aggctcct t 

g5348 caagagcaat t ac tgagtc 

g5393 t c c c a t c c t g t t t gga t ag 

g5438 ggc t t t aa t t t eg t age t a 

g5483 gc taccacgt t tgtgggag 

g5 52 8 gacacagt cccaagat etc 

g5573 gc c c c t t gc t t t gt c eg tg t 

g5618 eat tgae tggt ct t t cet t 

g5663 e tgat ttgac t ecctcet t 

g5708 ecat t ec t ctgggtgac t e 

g5753 actttccccagccgaaget 

g5798 gcgcgcgcc t cetgetgge 

E R P G 
g5843 t c t t ta qAGCGCCCGGGCC 

G G E P S S 
g5888 GGCGGCGAGCCCAGCTCGG 

F L G W L K 
g5933 TTCCTCGGCTGGCTCAAGA 

F R L Y D Q 
g 5 9 7 8 TTCCGCCTGTACGACCAGT 



BNSDOCID: <WO 981 1225A3JA> 



Fig.2(xiii) 
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actccatgtcacacccgtgcattctc 

ccgcccttgttctgtgctgtctgtct 

tcccagagccttttttttatgctttt 

aattgcttttgtataatgtgtgtgcc 

caacacacacgtgaaggttagagaac 

ccaccatgtgggactagggctggcga 

atctcgccagcccctcacccctcact 

tcataggtaatcgaaggtaaatcgct 

tcctgcctcagcctaccaagtgctgt 

gggctctcctcccagtgtctgggggt 

tgctttctaggtctttgtcttagttt 

ccctagagtctccggccccacttatc 

taccgaatactcggttttacctccca 

tgcttgtctccatcgccgtggcattg 

tgggtccacacctgacacctttccca 

ggtctggtatgggaggccgccgtccc 

cgcgccccaacactgccgctccattc 

PGGGVCEPR 
CGGGCGGCGGGGTGTGCGAGCCGCGG 

GPVRRELKQ 
GCCCGGTGCGGCGCGAGCTCAAGCAG 
KHAYCSNLS 
AGCACGCATACTGCTCGAACCTTAGT 

WRAWMQKSH 
GGCGTG C TTGG ATG C AG AAGT CACAO 



Fig.2(xiv) 
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K T R N Q V 
g6023 AAGACCCGAAACCAGGTAG 

G K G A E E 
g6068 GGTAAAGGAGCAGAGGAAG 

Q H R T L L 

g6113 CAACACCGCACTCTTCTTT 

P R A D G V 
p S G R R G A 
g6158 CCTCGGGCAGACGGGGTGC 
g6 2 0 3 GTGGGGCCTACAGCAGTCT 
g624 8 TGTTGCTCAAAGGGATCTC 
g6 2 9 3 GAGCCCCAGGTTTTACTGC 



g6 3 3 8 CTTAATGTGGCCTCTTTTC 



* 



g6 3 8 3 CTAAGGATAGGCCATCCTC 

g6 42 8 CTGAATTGGAGCCCCTCTG 

g64 7 3 CCAGAGGCTGGGCACAATG 

g6 5 1 8 ACATGATGGTCACACTTGG 

g6 5 6 3 GGTATTGCAGGGCCTCCCA 

g6 6 0 8 TTGTTCAGGTcccgatggc 

g6653 ggtgggggga 



Fig.2(xv) 
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GKLGEACVG 
GAAAGTTGGGGGAGGCTTGCGTGGGG 

ERDPGEQPP 
AGAGAGACCCGGGTGAGCAGCCTCCA 

S K H R T R G S C 

D E G I L 
CCAAGCACAGGACGAGGGGATCCTGC 



RREVRGSG* 
A R 

GGCGAGAGGTAAGGGGGTCTGGG TGA 
AGATGAGGCCCTTTCCCCTCCTTCGG 
TTAGTGCTCATTTCACCCACTGCAAA 
ATCATCAAGTTGCTGAAGGGTCCAGG 

V L P A K L 
G P A G * 
TGCCCTCAGGTCCTGCCGGCTAAACT 



CTGCTGGGTCAGACCTGGAGGCTCAC 

TACCATCTGGGCAACAAAGAAACCTA 
AGCTCCCACAACCACAGCTTTGGTCC 
ATATACCCCAGTGTGGGTAGGGTTGG 
AGAGTCTCTTTAAATAAATAAAGGAG 
cagtgtgt t tggggcc t atgtgc tgg 

Fig.2(xvi) 
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20/43 


21/43 


22/43 


23/43 


24/43 


25/43 


26/43 


27/43 


28/43 


2943 


30/43 


31/43 


32/43 


33/43 


34/43 


35/43 


36/43 


37/43 


38/43 


39/43 


40/43 


41/43 



Fig.S 
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GCGGCCGCTG CAGTGATTAC TCACCGCGTG 
TTTTTCCGTG GGGGGATGTG AAGAAGTTTA 
GGAATGCAGG GTTCGGTCCC GTTCCCCAAA 
AAGGGCTCCC TGCACGCGCT CCGGGACATC 
TGAGAAGGGA CCAGAGGCCG GAGACTCCCT 
ACGAAACGAG ACTACAGCGA TGGGAGAGGT 
GACCCATGCA CCCAGAGAAA GGGACTGGTG 
AGGGCTGAAA GAGGATGAAC GGGCTCAGGT 
TGGGTATGGG GGCCCCGTAA GAGGGGCGGG 
GGAGGGGATC CTGGAAAAGC ACCAGGGCTG 
ACAGGATCCC AGATGAGGGG GTGGGAAGCC 
CACGGGCTGG TGGGGAAAGA GTGGGGGGCT 
GTAACTGGGC GGAGGCCGGC CGGGCGGGGC 
GTGCGGGGCC CACGATCAAC CCCCCCCCAG 
CGGGGCGAGC GGCGCATTAG CGCCTTGTCA 
CGCTGTCCGC GCCCAGTGAC GCGCGTGAGG 
CGCCCCCGCC CCATACCGGC GTTGCAGTCA 
GGGTCGCCCG GGCCCCGTCG CCCAATCCGC 



Fig.3(i) 
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GCGCACCCCA 


CCCGCGGGCC 


/*i f~pi "A rT~i TV 

GCTGAGTCjCjA 


o U 


GGGAGAACTC 


TTCTGCACCG 


ATGGGAAC i A 




GGACACACCT 


CT C C C C AT AA 


GCCCACTCAi 


"ion 
± o U 


CCCATATCCA 


AT AC C CGCAG 


7\ rn TV rri/^ TV rr^ TV rri 

ATATGAT AG i 


-ddft U 


CCCTGCCTTC 


TGGCTTTCCC 


CCCCCCCTGC 


o n n 
o U U 


GGCATGAAGG 


CTTAGGGTGG 


GGATCGGTAG 


o ^ n 
o o U 


GCAACTTTCA 


AACTCTCTGG 


GGAAGGAAGA 


4z U 


ACTGCTCAAT 


GTGTGTGTGG 


CGGACCAAAG 


4 o U 


GAAGGTGGAT 


AGGAAGGATC 


CCGGTAGACT 


b4 U 


CGAGCTAGGA 


ACCCATTCGG 


AGTTAAGGGT 


r\ 
6 U U 


TGGGACGGGC 


GGGACCAGAG 


AGGG AGGT C C 


/z fz r\ 
6 o U 


TCGCGCAGGA 


GGATGGGACG 


TT C AGGAGTG 


•TOO 


GCGCGGTGCC 


CGCGGGCGGT 


GGGAAGGC CG 


Q n 
/ o U 


GGGCCGGGCC 


GGGCCGGGGG 


CGGGGCCGGG 


840 


ATTTCGGCTG 


CTCAGACTTG 


CTCCGGCCTT 


900 


ACCCGAGCCC 


CAATCTGCAC 


CCCGCAGACT 


960 


CCGCCCGTTG 


CGCGCCACCC 


CCATGCCCGC 


1020 


GCGGCGGCCG 


CCGCGGCCGC 


TGTCCTCGCT 


1080 



BNSOOCID: •!VVO_981122SA3JA» 
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GTGGTCGCCT CTGTTGCTCT GTGTCCTCGG 
GTACCGTGCG CCCTGCTCCC CACCTCCCCA 
AGTCGCGGGG GATGGAAGAA GGGGCGCGAG 
GGCGGCCCTC GGGGCGCCCT CACCTGTGGG 
AGTACCCCGT TATACATCAG AGGCCTCTTA 
AGGCTCAGTT TGAAGGACAT CGCAGTGTCC 
GCTTCGGGGC GCACGCCTGT GTCTTGGATA 
GGGCGCACGC TTGGGTGCGT TGGGTTGGGT 
GAAGTGATGA TCCCCGGGGG GAGGGTGGGG 
ATGCGGCCCG GCGTCCCTCG GGACTTGCCT 
CTATAGCAGA CTCCATGCTT TGGTATCCTC 
CGGTCTGATT CAGGCTGCGC TGGGTTGAGA 
CGAGAGCAAG CGTGTCCGGG CACCGCGAGC 
GGGGGTCAGC TGCCGAGAGA ATCCCACTGT 
ATCACCCAAC GCACACATCC CCGCCAGGAT 
CACACCCAAA GACACACAAA AGAGCCCCAC 
CGCGCGCTGC AGCCCAGATG CGTATTCGCA 
ACACACACAC ACACACACAC ACACACACAC 



Fig.3(iii) 
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GGTGCCTCGG 


GGCGGATCGG 


GAGCCCGTGA 


1140 


GGGAAGCCGG 


GATCCGGCGC 


CCCGGGGGGT 


1200 


CGCCACCTGG 


ACGTCCCGGG 


AACAAAGGAA 


1260 


GCTCATGGCA 


CCACCACCCA 


GCCTCCCAAG 


1320 


TCTGTATCCC 


CTTTGCGAGG 


CTGTCTGGCC 


1380 


TGGGACCCCC 


CTCCTTCAGG 


GTGCTGGGAC 


1440 


TCAGAGCGGA 


AGGGAAGCCT 


CCCTGGCCGG 


1500 


GCTGGCGCAA 


AGTGGGGTCC 


CCTCCCCCAT 


1560 


CGTTATCGTG 


AGCCCTCCTG 


TCCGCCTGGC 


1620 


CTCCGTGGGG 


TCGGCGCCGC 


CCCCTCCCCC 


1680 


GAAGTCCTCT 


CCACTGGTGG 


GGCTCACAAC 


174 0 


GCCTCTAGCG 


ACTGAAATTT 


CGGTGAGGAG 


1800 


CCAGACTTCA 


TTGTCTAAGG 


GGCACCCAGT 


1860 


/^/^ /-I TV /—I /~l 7\ /~t /~i TV 

C C C A(jr(j ACjVjA 






1920 


GCGGTCTCCA 


CATCCAGACC 


CTCTCTGGGA 


1980 


TGGCTTATGT 


CCCGTCACCC 


TGCCCTCCGA 


2040 


CACCATCGCG 


GCGCTCGCAT 


TCCATCCTCT 


2100 


ACACACACAC 


ACACACAGAC 


ACGCACACAC 


2160 



Fig.3(iv) 
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ACACGCACGC ACACACACGC ACGCCCGCAC 
GCAACACCGG GGTACGCATA TGGTTGAGTG 
ACCCCATCCG GAGACACAGG CCACACCGCA 
TAGTAGTCTT GTGCAGTTTG TCCGCGGTGT 
ACAGGAACCT ACACTCCTGC TTGCCCAAGG 
GACCTTTCCG GGGAGTTGGT GTTGCTGCCA 
GCGCTAAGCT TTGTTTCCGG GCGGGCTGCA 
TGGCGCGTGT GTTTTTTCTT TTAAGGGGGA 
TGCAATCTGT TTGTACTTAC CGTGTGTCTT 
AAAGTGTATG CAGGTACCAG CGGGACAGGA 
GAGGCCACCT TCCCGTTGGC CTTTCAGGGA 
GTGTTCTTTT TAATAACGGC AGCAACTCCG 
GGCCCCGGCT TTGTGGAAAG GAGGGGAAGA 
GGCTTAGGGG GCTGTCAGCT GCTGCTCTGT 
AGTGGCTTTG GCCCATTGTT TGTGGAAGCC 
TACTCCAGAG TCAGGCTTCT CAGTCCGAGC 
GAATCAGGGA AGGGGGTGCC AGGTGGACTA 
AAGGAGAAAG CTTGGGCTTG CCCCCCTCCC 

Fi9.3(v) 
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TCGTGGTCCC 


ACATTTATTT 


CACAGGGGAG 


2220 


CACTGGAGAT 


CTTTCCCCAC 


CACTCTCAGG 


2280 


GGGGCACCAC 


GCTGCGCTGC 


TGCTCTGGGC 


2340 


CTGTGGACGC 


CCTCCCGCTC 


TTGTCAGGGG 


2400 


CGGCTGGGCA 


GGTGATGTGG 


TGACACCCGG 


2460 


AGCCTGGGTA 


GTTTTTGAAT 


GCCACCAATA 


2520 


GAGCAACAGG 


CGAAGGTGGC 


GGAGTGGGGG 


2580 




ATAAGAGGTT 


CTCACACCTC 


2640 




CAGCCAGCCG 


GTGGGTCGTA 


2700 


GATGGGGGCC 


CCTGGGGTAT 


GGCTGGGATG 


2760 


ATPTCACACT 


TTTCCCTTTT 

JLa VhT VmT ^ X« aJL «L 


AAAACACATG 


2 82 0 


CATTGGGAAA 


GGGGGAAATA 


AGCTTGTATA 


2880 


nGGAAGAAAA 


AAGGAGGGGT 


GTCTCCTCCA 


2940 


CTAGCTTGGC 


ATGTGTGTGC 


CCCAGTCCCC 


~i r\c\r\ 
J U U U 


AAGAGGGAGA 


CTGGAGTCCT 


CTATCTCTGG 


3060 


CCAGAGAACG 


TCTTCCCTGT 


TTTATGGAGG 


3120 


CGTTCTGCTG 


AGGACTGTAC 


CAGTCGCTCG 


3180 


CCCTCAAGCC 


ACGAAGGGCA 


GCTGCTAGGC 


3240 



Fig.3(vi) 
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TAGTGTGGTA AAAGGGCATT ACTCCCCAGC 
CAGACAAATG CTGGGGAGGG ACAGAGGGGT 
GGTCCCGGGT CGGGCAGTGC CTCCCACCCT 
GGGTGGGCCG GGGTAGAGAC GCTGGCACGT • 
GCGGGCGGCT GGCTGCCTGG GACCTCCGGG 
GCCTGCTCCT CCTGCTCCTT CGCACGGACG 
CCCAAATGCA ACTGCGATTG CAGGCTTCGC 
CCTGGGAGAA GTCATTCAGG GCCCAGACTA 
GGGCATGAAG GACCGTCCAG GGCTGCAGTT 
GCAGCCTCTG TTCTCCGAGC CTCTTTGGAA 
AATACTCTTT TCCTCTCATC CCATCCCGGG 
TGCAGTCTTC CCTAACCTTT TCTTTGCTTC 
CCTCTCCCCT TGCCCAACTG GGGCTCCAGC 
CAGGGCCTCT CTGACACACA GGGTTGTAGC 
CTCTTTTGCT TCTGAGACTT AATTTTTTTC 
TCTCTGTACA GCCCTGGCTG CCCTGGCACT 
ACAAACCTAC CTGCCTCTGC CTTTCCAGTG 
AGTAGTTAAG TGTTTTGCTG TGTCTTTATT 
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CAGGACCCCC 


CAGAGAGTCC 


CCTTCCTGGC 


3300 


GTGATCATTG 


CCCAGGAGTG 


CAGACAGTGG 


3360 


GCTGAGGGGG 


GCGCCCAGGC 


AGGAAGCGGT 


3420 


CCCAGTTCAT 


GCCGAAGGAA 


TTCTGAATTA 


3480 


GCGGCCCCCT 


GGCCCCCGCC 


GCTCCGTCTG 


3540 


CTGAGACCTC 


CGCTGAGCCC 


TGGGACAAGC 


3600 


AAGACCCGCC 


TCCTCCCAAG 


GCCAAATTTG 


3660 


GAACCATGTT 


GGTGCCACCT 


CATCCATCTG 


3720 


TAGCTTCTTA 


ATAGGAACCT 


GGGGGTGGGT 


3780 


ATCGGTTTTG 


TTTTTGTTTT 


TGTTTTTTCC 


3 84 0 


ACTGTTTTCC 


TCCCTAAGGG 


TTGAGAGCCC 


3900 


TACCCCAGGG 


CCTTTGCACA 


TGGAGTCCCA 


3960 


CTTACTGCAT 


TTGGCTCTTG 


GTAACTGTCC 


4020 


CCCAGCTCCC 


TCTCTTCTCC 


TCCCCCCTTT 


4080 


TTTTTCTTTT 


TGGCTTTTTG 


AGACAGGGTT 


4140 


CATTCTGTAG 


ACCAGGCTAG 


CCTCAAACTC 


4200 


CTGGCACTAA 


AGATGTGGGC 


CACCACAACT 


4260 


CCTATAGTGA 


CCTCAGTTCC 


TGGCATATTG 


4320 
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TAGGCGATGG ATGGATGAAT GGATGGATGG 
CTTGAATCGT CCTGAGTGAA AAAAGAGACC 
GGCAGCCTGG CCTGCTGGTC TCATGGGAGC 
CACCCTGCCA TCCTGTGTGG CTGACAAGAA 
AGGGAAGCTT GGAATATGTT CCCCTCCTCA 
CCAGCCTATG AGTAGGGCAG CTGTGGGCTG 
GTCCCTCAGG GTGGGTCACA GGATTGAGGT 
AGGAAATGAT TGTGGAGAGT CAGAACTCCT 
GCTTCTGTGG CTGTCCCTTC TCTTGTGGTC 
TGTGAGGAGG GCACGGGGAA AATGAAGGCT 
CCAACAGGGC TCACCTCTCC TCTGGACAGG 
TTTGATTCCC TTCCTTTGGT CTCCTGGGAT 
TTTTAGATAT GTCCATTCTC CAGAAACACA 
ACCACCAGGA CAGACAAAGA ATTGGAGAGG 
TGGCTTATGT GTAATCCCAG AACTCTGGAC 
CAGTGTGTTC TAGGTAATGA GACCCTGTCA 
ATGTTTATAG GCTGTGAGAC AGCTTGGTGG 
CCTCAGCCCC ATCCCTAGGA ATCCATGGTA 

Fig.3(ix) 
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ATGGATGGAT 


GGATGGTTGG 


ATGGAGCAAG 


4380 


TCAGAGAACT 


GAATGGAGTT 


AGGTTCCCAG 


4440 


TCCCTGTGAA 


ACTTCCCCCA 


CACCTCCCAC 


4500 


AGGCCAATGG 


CCAGATGGGG 


ACAGAGACTC 


4560 


TATCCTAGGC 


CTTGTTGTCC 


CCCTGAGGGC 


4620 


CCCTAAGGTT 


GGGTAGGCAA 


GAAGGGGGTG 


4680 


CATTTCCAAA 


GTGGCCATCA 


CAGTGGCCCT 


4740 


GTTGGGAGTT 


GTAGAGGGCC 


TTGCATGTGG 


4800 


CTTTGCACAG 


TCCCCTCGTG 


TGTGCTGGGA 


4860 


CAGCCCCTCA 


GCTTGCCCTT 


CACGGTTCAC 


4920 


CTCTCACTGT 


ATGCACAGAT 


TGGCCTCACA 


4980 


GACAAACATT 


TACCAGGGTA 


GGATTTTACA 


5040 


CTTGTGAGGT 


TAGGGTATCA 


GTGAAAGGAC 


5100 


AAGGAAATTG 


GTAAGCCAGG 


CCATGCTTGA 


5160 


GCTGAGGCAG 


GAGGATTCCA 


AGTTTCAAGA 


5220 


AGAAAAGAAA 


AGAAATAAAG 


AGACAAGAAA 


5280 


GTAAGGGGCA 


CTTGCCTCCA 


ATCAAGATGA 


5340 


GAAGGAGAAA 


GCAAACTCCA 


GCTGCTGACC 


5400 
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TCCATACATG TGCTCCAATG TGCACACACA 
TTTGCTTAGA TTTGAGTAGG CATTTATGAC 
GAAAATATAC CTGTTTGTAT TTGGTTTGGT 
GCTTCTCTGT GTAGTCCTGG CTGTCCTTGG 
ACTCAGAAAT CCGCCTGCTT GTGCTTCCCA 
TCAGCAAAAT TGCATACTTT AACCCCAGTA 
ATTCCAGGCT AGCCAAGGAT ACAGAGTGAG 
CCAAAATGTA TTTTGTGCTT GTGTATGTAC 
ACAACTTGTA GAAGTTCTCT CCGTTCACAG 
AGGCTTAGCC ACAGTCTTCT TTATGTACTG 
GAATTAATTT TTGAGATAAG GTCTCTTGTA 
AAGGTCATCT TGAGCTGCTG GTACTCTTGC 
GCAGCACTTC TCTGGGGAAG GGGCTGGCCT 
GAGTGCTTGG GTCTCGTTGT TTCTTTTCTT 
GACTTCCTGA CTCTTGAAAC ATCCAGGCAG 
GCCTAACAAA GTGTCGTCTT TGACCCCAGA 
CCTTCTCATC GGCTCCTCCC TGCAAGCTAC 
CACCGCTGAG GGGCTCTACT GGACCTTCAA 

Fi9.3(xi) 
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CAGGGAGACA 


TAATCAATTA 


ATAGGATGTA 


5460 


TGATGTTTTA 


AAATTTTTAT 


TTGATTTTAT 


5520 


TTGGTTTGAG 


TTTTGTTTAT 


TTGAGACAGG 


5580 


AACTCACTCT 


GTAGACCAGG 


CTGGCCTTGA 


5640 


AGTGCTTAGA 


TTAAAGGTGT 


GCACTGCCAT 


5700 


TTTGGGAGGC 


AGAGGCAGAC 


TAATGTGTGA 


5760 


ACCCTATTCT 


TACCCTCCCC 


CCCCAAAACC 


5820 


ATGTGTGTTG 


CAGCACGTAA 


ATGTCCAAGG 


5880 


TCTAAGTCCT 


GAATTCAAAC 


TAAGGTCCTC 


5940 


AGCCATTTCA 


CTGGCCCTGG 


ATTGACTGAT 


6000 


GCTCTAGCTA 


GGCTCAAACT 


ATGAACTCCC 


6060 


TTCCACCCCA 


AGTGGTGGAA 


TGATACTCAG 


6120 


TGGCCTTGAT 


TTTGTTGCCT 


CAGCTTCAAT 


6180 


TATCTGTGAA 


ATGGGTGAAC 


ACCTGTTCAA 


6240 


GGTGAGGGAC 


TTGAAGTGGG 


CTCATCCCAT 


6300 


CACAGCTGTA 


ATCAGCCCCC 


AGGACCCCAC 


6360 


CTGCTCTATA 


CATGGAGACA 


CACCTGGGGC 


6420 


TGGTCGCCGC 


CTGCCCTCTG 


AGCTGTCCCG 


6480 
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CCTCCTTAAC ACCTCCACCC TGGCCCTGGC 
GTCAGGAGAC AATCTGGTGT GTCACGCCCG 
CTATGTTGGC TGTAAGTGGG GCCCCAGACA 
GATTTAGAGC CTGGGTCTTC TGTCCTGGGG ■ 
CATGGTCATA CCCAGCACAG GCATTGCAAC 
TGTGTACCCC ACAGCTTTAG AAAAGCTGTC 
CCTTTAACAT CAGCTGCTGG TCCCGGAACA 
GTGCACACGG GGAGACATTC TTACATACCA 
TACCCAGCCA AGCCTTGCTG TGTGACTTCT 
TTCCTGTTTA TGAACTCAAA AGGGACTCTC 
CACATGTGAG GAGTACCACA CTGTGGGCCC 
CCTCTTCACT CCCTATGAGA TCTGGGTGGA 
TGATGTCCTC ACACTGGATG TCCTGGACGT 
GCCCTAGACC TTATAGGGCG CCTCCCCCCC 
GTCTTAGCCA CAGCCACGGT GGTTGCAGGA 
TTTCCCCCAA GACAGTCAAG ATTTTCCCCT 
CTCTGCAGAG AACACCTGGC CTGACCACCC 
GAGTCCTAGG GGACTGAGAG GAGGCGCCCA 
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CCTGGCTAAC 


CTTAATGGGT 


CCAGGCAGCA 


6540 


AGACGGCAGC 


ATTCTGGCTG 


GCTCCTGCCT 


6600 


CTCAGAGATA 


GATGGGGGTT 


GGCAATGACA 


6660 


CAGAGCCATG 


GGCTCTCACT 


TGCATGCAGG 


6720 


TCTAGGGACA 


GCTGTGGCTG 


CACTGTCCCC 


6780 


ATGTTTTCCT 


TGTAGTGCCC 


CCTGAGAAGC 


6840 


TGAAGGATCT 


CACGTGCCGC 


TGGACACCGG 


6900 


ACTACTCCCT 


CAAGTACAAG 


CTGAGGTTGG 


6960 


GGCAATACTT 


ACCTTCTCTG 


ATCAAATATG 


7020 


GCACCTCCAC 


AGGTGGTACG 


GTCAGGATAA 


7080 


TCACTCATGC 


CATATCCCCA 


AGGACCTGGC 


7140 


AGCCACCAAT 


CGCCTAGGCT 


CAGCAAGATC 


7200 


GGGTGAGCCC 


CCAGTGTCCA 


CCTGTGTTCT 


7260 


ATCCCCCCAG 


ACTTTTTGGT 


TCTTCTAGAG 


7320 


CAGTGGTTGT 


TCATAACTTA 


ATGCAAAGAC 


7380 


CCCCACCCCC 


AACACACACA 


TACACACACA 


7440 


TCCCTCTCTA 


CAGCCCAGGT 


GTTCAGAAGG 


7500 


GGTCTGAAGG 


CGCCCCAGGA 


AGCCGAGGCC 


7560 
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TTGAGCTGGG GGGGGGGGCG AGGGTTGGAG 
GGGCCTAATC TAATTAGGGT GTTCCCAGCC 
GTGCCTCACT GAAGACTCAG GGGAGAGATC 
GGGTTCCTGG GTGCCCCTGG CTCATTCCCA 
TAACCCTCAG TTGTGCTCTG TGGCTGGCAC 
CAAGGCATCA GAGGTGGACA TGGGATGGGG 
AAGGTGGGGT GATATACAAT AAAGCTTGTC 
GATCACAATT GTTGACATCA CTCTGGGACA 
AGTAGCTTTA AGAGTCAGCT TGTGACTTAA 
GTGATGCTCG CCTCACTCCC TGTTTAGTGA 
GTGGGCTGCT CTGTCCCCTT GAGGGCAGGA 
TGGTAGCAGC AACTGCTGCT GGCTGTTTCT 
CTGGGTGAGT AGCTAACAGG GGTGGGGGCG 
AGCCACTGCA GCCTAGATTA CACCACTGGG 
AGTCCTCAGA ACTGGGAGCA CTGTTGCCAG 
AGGGGAGGCA GAGGCAGAAG GATCTCTCTG 
AGCTCCAGGC CAGCCAGGGT GCGCAGTAAA 
TGACCAGGCT TGCTCCACCC CCAGTGACCA 
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GCACGAACTG 


GATGATCCCT 


GAGCACAACT 


7620 


CAAAGCAGCC 


TGGGCCATTT 


AACCCTTCAA 


7680 


AGCTTGTACT 


CTCTCCATGG 


TCCCCCAGGA 


7740 


CATCCAGAGG 


TTTTGTGTCT 


TCCTGGCATC 


7800 


AGCTGCCCCG 


TGGAGGCTCT 


TGGTAATGTA 


7860 


ATACATAGGG 


ATGGAGCCAA 


ATAGCACCTC 


7920 


ACCCTGACGC 


TCAGAAA.GCC 


TACTCATGAT 


7980 


TGTAGTGAGA 


CCCTAGCTCA 


AAACACAGAC 


8040 


TACTGGAACT 


CAGGGCCTAA 


TAGGTGCTGG 


8100 


GATCTCTGCG 


CTAATCTCCA 


CCCCAGCTGG 


8160 


ATGTGTGTCT 


TCCATCAGAG 


ATAGGACCCG 


8220 


GGAATATTAA 


ATGACAGTAA 


TCTATCAGGC 


8280 


TGGTCTGGAA 


AACGCAGATA 


GGGTCATAGG 


8340 


TGTTCTGTCA 


CTAGGCCATT 


CTCACCAAGC 


8400 


CATTTAATGC 


CAGCATTTAA 


TGCCAGCATT 


8460 


AGTTCAAGGC 


CATCCTGAAT 


TTACATAAAG 


8520 


ACCTTGTCTC 


AAAAAACAAA 


GCATCTTTAG 


8580 


CGGACCCCCC 


ACCCGACGTG 


CACGTGAGCC 


8640 
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GCGTTGGGGG CCTGGAGGAC CAGCTGAGTG 
ATTTCCTCTT CCAAGCCAAG TACCAGATCC 
AGGTGCCCGT CCCGCCCCGG ACCCGCCCCT 
CACCGTGCAG GTGGTGGATG ACGTCAGCAA 
GCCCGGCACC GTTTACTTCG TCCAAGTGCG 
AAAGGCGGGA ATCTGGAGCG AGTGGAGCCA 
TGAGCACCTC TCCAGGGCTG GCTGGCCCAT 
CCCACCCTTT TTTTGAGACA GCGTCTTCAG 
TAGTCAAGGA TGACCTCGAG CTCCTGGTCT 
GGCCATCACC ACCTTTGGGA GACTAGCCAT 
GATGGAGTAC AACAGTGTGA CCTCTTGTAA 
AATATCCTAG GCTCTCTAGA GGTTAACTTT 
TCACATGGTC CCACAGAACC TTTTGTCACA 
CT^CATAAGGG TCTCTACTGC TGGCCCACCC 
CTTAATATTT GCAATCCTCC TACCTCAGCC 
CAAGTTTCTC TTCTCTGGGT CCCTTTCTTA 
GTCCTGAAGA CTCTCCGAGC CCATGGATCT 
AATGTCTGGC CTCAGTTTCC CCACCTGTCA 
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TGCGCTGGGT 


CTCACCACCA 


GCTCTCAAGG 


8700 


GCTACCGCGT 


GGAGGACAGC 


GTGGACTGGA 


8760 


GACCCCGCCC 


CCCGCATCTG 


ACTCCTCCCT 


8820 


CCAGACCTCC 


TGCCGTCTCG 


CGGGCCTGAA 


8880 


TTGTAACCCA 


TTCGGGATCT 


ATGGGTCGAA 


8940 


CCCCACCGCT 


GCCTCCACCC 


CTCGAAGTGG 


. 9000 


GGAATCCCCA 


ATCCATCCTG 


TTCCTTCCCC 


9060 


GTAGCGCATG 


CTGGCCTTAA 


ATTCAGTATG 


912 0 


TTTTGTCTCC 


ACTTAGAGAC 


AATGGCCAGT 


9180 


GGAGTCTATT 


TAGCCTGTCA 


TTTGGTGACA 


9240 


GAGAACTGAA 


GACAGGCTGT 


TTTTAACCCC 


9300 


ATATAAAATA 


GAGACTATTA 


CAGCCAGTTA 


9360 


CAACCTATAG 


ACCACAGTGC 


CTGTGCCTAC 


9420 


CTCCAACCCT 


TAAAAGGTAA 


CCTAGGCAGC 


9480 


TCTTGAATGC 


TCAGAAACCA 


GGCATTAACC 


9540 


AGGTGGGAGG 


GCCTAAAGAT 


GACTTCCTTT 


9600 


GCACTCTCTA 


ATATGAAATA 


TATTGCATAA 


9660 


GGTTTAGGCA 


GCACAGTCGG 


TCCAAGACAC 


9720 



Fig.3(xviii) 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <VVO 981 1225A3JA> 



wo 98/11225 PCT/GB97/02479 

38/43 



TTCATTATTT GCAGGCAGTA TAAGAAGAAG 
CTAAGACAGA ATACTTCTAC ACTGAAACTG 
TGATGATGAA ATAATGGGGA AACTGAGGCT 
ACCAGCTCCA GGAAGCTCTC CAGCCCCCAT 
GAGTGAACAC AGCTGGGAGG GGCTGGAGCC 
ACCTGCGATT CTTGCACGGG AGCCAGCAGG 
CCGGGGGTAG GGTTGGAGGG AGGTAAGCAG 
CCTGTCAGCG AGTCCCCAGT TTTATTTATG 
TGCTGGGGGA TGGCTGCGGC TGGGGATTGG 
CAGCCCACTC CATGTCACAC CCGTGCATTC 
TTCTGTGCTG TCTGTCTCTA TTTCTGTCAT 
TTAATATAAC TACGTTTTAA AAATTGCTTT 
GTGCCACAAC ACACACGTGA AGGTTAGAGA 
GGGACTAGGG CTGGCGACAA GAGCAATTAC 
CTTCCCATCC TGTTTGGATA GTCATAGGTA 
TAGCTATCCT GCCTCAGCCT ACCAAGTGCT 
TCCCAGTGTC TGGGGGTACA CAGTCCCAAG 
TGCCCCTTGC TTTGTCCGTG TCCCTAGAGT 

Fig,3(xix) 

SUBSTITUTE SHEET (RULE 26) 
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CTCCCATCCC 


CCACCCGCTT 


CCTCCGGTCC 


9780 


AACTCTCGCA 


GAL-GCA 1 A TG 


CTCACTTTAA 


9840 


C CGAGAGATT 


C C TGG AGG AA 


TV /-^ rn TV TV TV 

GAGGGT C AAA 


9900 


CCGGGCCTCT 


CLAGGTTCTG 


GGCTTGGCGG 


9960 


TGGGAGCTTT 


GGCCCTTGCT 


CGTGCCCAGC 


10020 


CGGCTGCGTC 


C G C C C GAG AG 


ACTGAAGAAG 


10080 


GGGCTGTGGG 


GGCCGAAGCT 


TGTGCCAGGG 


10140 


GCGTGAGGCC 


GATGTCCTTA 


TCCGCTGGCC 


10200 


ACCCAAGGGC 


TGGCTTCCCA 


CTCAGTCCTC 


10260 


TCTGAGGCTT 


TV III /"^ lit m ✓"^ TV TV 

ATCTTGGGAA 


CCCGCCCTTG 


10320 


TCACTTTCCC 


TV TV m m mm 

AGAGCCTTTT 


TTTTATGCTT 


10380 


TGTATAATGT 


GTGTGCCTTC 


GTGAGCGTGC 


10440 


ACTTTGTTGA 


GTAGGCTCCT 


m TV TV m/^m 

T C C AC CATGT 


10500 


TGAGTCATCT 


CGCCAGCCCC 


TCACCCCTCA 


10560 


ATCGAAGGTA 


AATCGCTGGC 


TTTAATTTCG 


10620 


GTGCTACCAC 


GTTTGTGGGA 


GGGGCTCTCC 


10680 


ATCTCTGCTT 


TCTAGGTCTT 


TGTCTTAGTT 


10740 


CTCCGGCCCC 


ACTTAGTCTC 


CATTGATTTC 


10800 
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CTTTCTGACC GAATACTCGG TTTTACCTCC 
CCATCGCCGT GGCATTGCCA TTCCTCTGGG 
CAACTTTCCC CAGCCGAAGC TGGTCTGGTA 
GCTGGCCGCG CCCCAACACT GCCGCTCCAT 
GGGTGTGCGA GCCGCGGGGC GGCGAGCCCA 
AGTTCCTCGG CTGGCTCAAG AAGCACGCAT 
ACCAGTGGCG TGCTTGGATG CAGAAGTCAC 
GGGAGGCTTG CGTGGGGGGT AAAGGAGCAG 
CACAACACCG CACTCTTCTT TCCAAGCACA 
GGGTGCGGCG AGAGGTAAGG GGGTCTGGGT 
CCTTTCCCCT CCTTCGGTGT TGCTCAAAGG 
AAGAGCCCCA GGTTTTACTG CATCATCAAG 
CTTTTCTGCC CTCAGGTCCT GCCGGCTAAA 
CAGACCTGGA GGCTCACCTG AATTGGAGCC 
TACCAGAGGC TGGGCACAAT GAGCTCCCAC 
ACTTGGATAT ACCCCAGTGT GGGTAGGGTT 
TTAAATAAAT AAAGGAGTTG TTCAGGTCCC 
GGGGTGGGGG GA 

Fig.3(xxi) 
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CACTGATTTG 


ACTCCCTCCT 


TTGCTTGTCT 


10860 


TGACTCTGGG 


TCCACACCTG 


ACACCTTTCC 


10920 


TGGGAGGCCG 


CCGTCCCGCG 


CGCGCCTCCT 


10980 


TCTCTTTAGA 


GCGCCCGGGC 


CCGGGCGGCG 


11040 


GCTCGGGCCC 


GGTGCGGCGC 


GAGCTCAAGC 


11100 


ACTGCTCGAA 


CCTTAGTTTC 


CGCCTGTACG 


11160 


ACAAGACCCG 


AAACCAGGTA 


GGAAAGTTGG 


11220 


AGGAAGAGAG 


AGACCCGGGT 


GAGCAGCCTC 


11280 


GGACGAGGGG 


ATCCTGCCCT 


CGGGCAGACG 


11340 


GAGTGGGGCC 


TACAGCAGTC 


TAGATGAGGC 


11400 


GATCTCTTAG 


TGCTCATTTC 


ACCCACTGCA 


11460 


TTGCTGAAGG 


GTCCAGGCTT 


AATGTGGCCT 


11520 


PTCT A AGG AT 


AGGCCATCCT 


CCTGCTGGGT 


11580 


CCTCTGTACC 


ATCTGGGCAA 


CAAAGAAACC 


11640 


AACCACAGCT 


TTGGTCCACA 


TGATGGTCAC 


11700 


GGGGTATTGC 


AGGGCCTCCC 


AAGAGTCTCT 


11760 


GATGGCCAGT 


GTGTTTGGGG 


CCTATGTGCT 


11820 








11832 
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